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In the case of a sequence of Bernoulli trials, we need a few more prop- 
erties of the sequences (X,) and (S,) considered there. We have already 
noted: (a) Each of the variables X, has distribution 87. By Theorem 5.4.4 
we have: (b) The sequence (X,) is independent. We then have, by Sec- 
tion 5.3, Example 1: (c) S, has distribution 67. 


PROBLEM 


Assume that the sequence (X,),ew satisfies the strong law of large num- 
bers, and that E(X,) = E(X1) for every n E N. Let f: R — R be con- 
tinuous and bounded. 

Prove: 


n 


lim E (J E > x) = f(E(X)). 


i=l 


6.2 ZERO-ONE LAWS 


We return momentarily to Example 4 at the conclusion of Section 5.1. 
There we found that for every independent sequence (An)new of events 
of a probability space (0,96, P), the probability of occurrence of infinitely 
many of the A, can only be equal to either 0 or 1. In symbols, 

P (lim sup An) = 0 or zl (6.2.1) 
We now present a necessary and sufficient condition for the occurrence 
of each of these two cases. 


6.2.1. Lemma of Borel-Cantelli. Let (Ana)nen be a sequence of 
events and A = lim sup A,. Then 


no 


0 


> PUA ee PCA (6.2.2) 


n=l 


If the sequence (A,)nen is independent, then the converse also holds: 


oo 


y PUR E SPON 1: (6.2.3) 


n=1 
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Proof. From the definition of A it follows that A C V A, for every 


UID J"Dhen by. (:90); P(A). S22", PCA, oralna whence 


(6.2.2) follows. 
Preliminary to proving the converse we note the following: For every 
sequence (æn) of real numbers satisfying 0 € an € 1 (n EN), 
= +% = lim || (1 —a,) = 0. (6.2.4) 


n ov 
v=1 


a, < 1 for all n; otherwise the assertion is trivial. 
— a4, and therefore, 


= 


n = 


We may assume 0 € 
Now log (1 — æn) € 


TETEE log Ý =a) s E 
yal 


v=1 


Consequently, 


[| a - a) s exp (- » 2 


yz] 


that is, (6.2.4) follows. 
Now suppose the sequence (A,) and thus also, by Corollary 5.1.4, the 


sequence ( £A.) is independent. Then 
N 


N N 
P Cabs Uus POtAS c II Gee: 


m=n m=n m=n 


for any two natural numbers » € N. It follows from the definition of A 


that 
and we obtain 
1 — P(A) = P( CA) = lim P (A CA,,) 


ate 


= slim. lim 2 (A CA.) 


nw» N> œ 


due to the continuity properties of a probability measure. Thus we obtain 


N 
1— P(A) = lim lim [] à — P(A,,)). 


— o — 0o 
n N- eap 


1 This follows immediately from the power series expansion of log (1 — x) or from 
the Mean Value Theorem of differential calculus. 
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If we now apply the preliminary remark to the sequence o; = P(Aj+n-1), 
(=o? e wetoblarm 
N 
lim ll (1 — PAL) = 0, lor cach ne e 2 S 
N- «© 


m=n 


and thus the rest of the assertion. J 


In the case of an independent sequence (An), the statement (6.2.1) 
together with the result obtained in Lemma 6.2.1 on the occurrence of 
both possible cases is called the Zero-One Law of E. Borel. 


6.2.2. Corollary. If a sequence (An)nen of events has an independent 
subsequence (A,,).cw such that Z7.; P(An,) = +, then 
P (lim sup An) = 1. 


Proof. The assertion follows from Lemma 6.2.1 because lim sup An, 
(cim sup Ase » 2 


Example 


A coin is tossed successively infinitely often. We wish to find the prob- 
ability that two successive heads are thrown infinitely often. 


Solution. Let (0,9(, P) be the probability space defined at the beginning 
of Section 6.1 with p = q = 4. Let A, denote the event that a head turns 
up on both the nth and (n + 1)st toss. Obviously P(A,) = 47! and thus 
Èr -1 P(A) = +0. ~ 

A = lim sup A, is the event of interest to us. By Corollary 6.2.2 we 
have P(A) = 1 since the sequence (An) [but not (A,)] is independent. 


In Theorem 5.1.7 we presented the more general Kolmogorov 0-1-Law. 
In particular, for c-algebras Xn, = 9((X,) generated by random variables 
Xn, it reads: 


6.2.3. Theorem. Let (X,),cw be an independent sequence of random 
variables (with values in arbitrary measurable spaces). Then for every 
terminal event, that is, for every event 


AE (\X(Xnj m z n) 


n1 


we have P(A) = 0 or P(A) = 1. 


We wish to consider how this 0-1-Law gives us new insight into the 
question of the validity of the strong law of large numbers. Thus, suppose 
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in particular that (X,)nen is an independent sequence of real random 
variables on (@,%,P). Further, let (rn) ney be a sequence of real numbers 
satisfying lim rn = 0. We define 


n © 


"d M Xs Gree tu) (6.2.5) 


and consider the set A of all elementary events w € Q such that 
lim Y,(w) = 0. Since 


A = {lim sup Y, = 0} ' (lim inf Y, = 0}, 


no n—- o 


A is an event by Theorems 2.1.3 and 2.1.5. Since lim rn = 0 and 


ros xen x (Decus 1), 
i=m 
it follows that the set A remains the same when we change finitely many 
random variables (for example, set some equal to zero) in the sequence 
(X,). Hence for every m = 1, 2, 


a it: 
k=1 N=m \n2N 
Thus A is a terminal event of the sequence (X,) and therefore P(A) can 
only be 0 or 1. 

If every X, is also integrable, then the sequence (X, — E(X,)) is-again 
independent (by Theorem 5.2.3). The above result applied to this sequence 
andr, = n^! yields that 


P [lim > » (X; — E(X)) 20 


n wv N 


is either 0 or 1. Thus the strong law of large numbers holds by definition 
when this probability is 1. 

Analogously the probability of convergence of the sequence (6.2.5) can 
likewise be only 0 or 1. 


PROBLEMS 
1. A coin is tossed infinitely often. Prove: With probability 1 every 


prescribed finite sequence of “head” and “tail” appears infinitely 
often. 
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2. Let (X,).eew be an independent sequence of real random variables, 
and let (r;),ew be a sequence of real numbers converging to zero. 
Prove: With probability 1, the sequence (Xn) [or (rn Zz., (X3] con- 
verges everywhere or nowhere. 

3. Let p be a real number in [0,1]. Construct a probability space (Q,9, P) 
and a sequence (A,) of events in A such that 


oo 


» P(A, —-Fo and  P(lim sup An) = p. 
n=l JT 


(Hint: Try 2-= [0,1] and P = X5. 

4. Let (Xn)nenw be an independent sequence of centered integrable real 
random variables on (Q,9(,P). Prove: If (Xn) satisfies the strong law 
of large numbers, then 


oO 


X e f >| c d 


n=1 


for alle > 0. 


6.3 THE HAJEK-RENYI INEQUALITY 


All further considerations concerning the strong law of large numbers 
are based on the following inequality due to J. Hajek and H. Rényi: 


6.3.1. Theorem. Suppose we are given n independent integrable real 


random variables Xi, . . . , Xa, and n real numbers yı ZÈ `° ° 2 y, > 0. 
We define 

S: = Xi— E(X) +: +X; — E(X) Gisa aaa aA O, 
Then for every m = 1, . . . , n and every real e > 0, 


m 


P{ sup ye S ae) Ss dC ` V(Xj) + D xao). (6.3.2)? 


mLi<n « 
j=1 j=m+1 


Proof. By Section 4.3, Remark 2 and Theorem 5.2.3 we may assume 
all Xi ..., Xn to be centered. Then, in particular, S; = Xi+ -> 
+ X.. Further, all the variances V(X;) can be assumed finite since other- 
wise there is nothing to prove. E 


? Of course, Z7 ,,, o; = 0 is always assumed. 
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Denote by A the event on the left side of (6.3.2). If we set 


Am = f Yml Sml 2 e), 
A; (mS zm e) C 0yealScal «e 6s 0 {¥mlSnl « e 
(rm am Eq e rom) (Gross) 
then the events An, . . . , An are pairwise disjoint and A = U A,, 
whence m 
Pas x P(A, (6.3.4) 
follows. We also set y,,1 = 0 and 
Z2» -aS (6.3.5) 


Bienaymé's equality yields 
E(S) = Y(S) = V(X) + - > > + VC) 


and consequently 
m 


E(Z) =, ) V(X)- 2, V(X); (6.3.6) 


j=l j2m-41 
thus e~?H(Z) is the right side of (6.3.2). If we further set 
Y; = la, (22nd we v 3100), (5:9 
then Z = 0 and Zz, Y; = 14 € 1 imply the inequality 


pe y Y 
=m 
thus, due to the antitonicity of the sequence (7:)i<1,...,.41, We obtain 


E(Y;Z) = y ` (6 — VDES) 


i=mj=m 


E(Z) 2 


IV 
: tee 


= ) (3? — Y42E((Q5Sj). (6.3.8) 
i=mj=i 
Now if 
2 
EYS) Z P(A) (=m ...,mj-4...,n) (63.9) 
Yi 


holds, then (6.3.8) yields the further bound 


n 


E(Z) Ee ) P(A), 


=m 
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that is, yields the desired inequality because of (6.3.4) and (6.3.6). We now 
show (6.3.9). 
For every pair of indices 2, 7 considered there we set 
j 


U;2S—$- » X 


k=i+1 
and obtain 
E(Y;S?) = E(Y;S?) + E(Y;U;) + 2E(Y;S;U,). (6.3.10) 
By definition S;, Y; and thus S;Y; are 9((Xi, . . . ,X;)-measurable and 
Ui; is U(Xis1, . . . ,Xn)-measurable (the latter for? = 1, . . . „n — 1). 
Due to the independence of the (X;), both o-algebras are independent by 
Corollary 5.1.5; thus, Y;S; and U; are independent (i = 1, . . . , n — 1). 


Since U,, = 0, this also holds for 7 = n. But then, by Theorem 5.3.1, 
E(Y,S;Ui) = E(Y;S)E(U,;) = 0. 


If we substitute this result into (6.3.10), then from (6.3.3) we obtain 


2 
E(Y.S?) = E(Y,$) = | S?aP = $ P(A). 
Ai 24 


i 


But this is the inequality (6.3.9). J 


We emphasize two special cases of inequality (6.3.2): 


(a) Suppose we choose m =n = 1 and yı = 1. Then for every 
integrable real random variable X we have the Chebyshev inequality: 


P{|X — E(X)| = e Se l V(X) (e > 0). (6.3.11) 
This obviously also follows from the Chebyshev-Markov inequality 
derived in Lemma 2.11.1 by choosing p — 2. If X has finite variance, then 
(6.3.11) tells us that large deviations of the random variable X from the 
expected value have small probability. 

(b) Suppose we choose m = 1 and yı = ::: = y, = 1. Then for 
any n independent real integrable random variables Xi, . . . , X, and 
every e > 0 we obtain the Kolmogorov inequality: 


Y as - sa]: ds D V(X). (6.3.12) 


But the most important application of (6.3.2) for'our purposes is contained 
in the proof of Theorem 6.4.1. 


a sup 


1<j<n 
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6.4 THE KOLMOGOROV THEOREMS 


We now give a first answer to the question raised in Section 6.1: 


6.4.1. Theorem (Kolmogorov?). Let (X,),ew be an independent 
sequence of integrable real random variables. If 


V(X;) x 


di (o, (6.4.1) 


nzl 


then the sequence (Xn) satisfies the strong law of large numbers. 


Brook a Donn = 1.2... eset 


= D (X; — E(X))) and Vn = V(X): 


If we choose y; = 7-1 (¢ = 1, ..., n) in Theorem 6.3.1, then all the 
hypotheses there are obviously satisfied; it thus follows from inequality 
(6.3.2) that for every reale > 0 and every pair of natural numbers m < n, 


P{ sup Y » e] € Ee V; 4- 41) 
m<i<n m? J? 
j=1 j=m+1 
Letting n — œ we have 


( sup |Y;| > ejf {sup |Y;| > e}, 
mLi<n i2m 


which, together with the continuity property of P, yields 


Pisup [Yl > 9 8 5 (— » uns e x) 


J= HFN 


For every natural number M < m we can further majorize the right side: 


Pisup |Y] > $5 de » Vee »o 4) 


jJ=M+1 


Now lim m-? 2%,V; = 0 for every M and 


lim ERA zw 


M- œ 
j=M+1 


3 Theorem 6.4.1 is often called the Kolmogorov criterion. 
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by (6.4.1). Thus 

lim P {sup |Y;| > «} = 0 

m— v i2m 
for every e > 0. Then by Lemma 2.11.5, (Yn) approaches 0 almost surely, 
that is, the sequence (X,) satisfies the strong law of large numbers. _ 


If the random variables X, all have the same distribution u and if the 
variance 


VOX a [2n (dz) = (fau(dx))’, 


which is independent of n by (4.3.10), is finite, then (4.3.1) is satisfied. 
This holds for the sequence (X,) of Section 6.1 describing the sequence of 
Bernoulli trials. The theorem of E. Borel given there thus follows from 
Theorem 6.1.1. The following theorem now shows that the finiteness of 
V(X,) is no longer needed when all X, have the same distribution, that 
is, in our previous terminology when they are identically distributed. 


6.4.2. Theorem (Kolmogorov). Every independent sequence 
(Xn)nenw Of integrable real and identically distributed random variables 
satisfies the strong law of large numbers. 


Proof. Let u be the distribution of the X,, that is, 
Py=n  (n€N) 


In part 1 of the proof the variables X, will be “truncated” in such a way 
that Theorem 6.4.1 can be applied. We repeatedly use the transformation 
formula (4.3.6) without explicit mention. 


1. For every natural number n, let 


I, =]—n,+n[, H, = {x ER:n— 1< |r| <n} 
and 
fala) = x1z,(x) (ore Ry 
By Theorem 5.2.3 
Ya = fao Xa  (n€N) 


is then an independent sequence of real random variables. The common 
distribution is generally lost in the transition from (X,) to (Yn), but the 
hypotheses of Theorem 6.4.1 are satisfied for the sequence (Y,): Y, is 
square integrable due to the boundedness of Y,. By (4.3.10) we obtain 
for the variance of Y, 


Vir) S BOD = ite x) = | ac | ewan =) f, uan 


4 17, is of course the indicator function of J, with respect to R. 
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and, by substituting and rearranging, 
WAYS) - 1 : : : 1 
ee ES we 2 = a 2 
usi Lo $0230 79) 
n=1 nm=1 J= j=1 n=j : 


If we take note of the inequality 


oo 


Leas I + 1 AIME Coe 
Ey n sets Ft OMG PA) (aoa) X E 


we finally obtain the condition (6.4.1) for (Y,): 


> YO - 33! " 
n2 H; 
j21 


< » js Bale eie i ERC parol acct 


u (da) 


T 
j 


By Theorem 6.4.1 the sequence (Y,) then satisfies the strong law of large 
numbers, and thus there exists an event A such that P(A) — 1 and 


lim : (Yo) —E(Y)) - 0,  foralecA. 


2. Now let B be the set of all elementary events w € Q satisfying 
X,(w) ¥ Y,(w) for at most finitely many n = 1,2, . . . . Then 


n 


1 
lim - (X;(w) m E(Y;)) = 0, for all w E AA B, 
n wv im 


which is shown by the same reasoning used in the lines following (6.2.5). 
Therefore, we also have 


n 


lim : (X;(o) — E(X))) = 0, for all e E A (B, 
n— o T 


provided 
ipn 
lim — (E(Y) = E(X;)) = 0. 


n o 
i=l 
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But this can be seen as follows: All X, are identically distributed and 
thus have the same expected value a. We have 


a = fzu(dzr) = lim ff, du = lim E(f,° Xa) = lim E(Y,), 


since 
[fu(x)| € |z| and Jf |x|u(dx) = E(|X,|) < + 


allow us to use Theorem 2.7.4 on dominated convergence. But when 
(E(Y,) — a) is a null sequence, the sequence of arithmetic means, 

n-! E? 4, (E(Y; — o), is then also a null sequence. Thus, the theorem is 
proved if B is an almost sure event. Then it would follow from P(A) — 
PB) pt thattP AC YB) —Lby4E3:5) 


3. Let 
C, = (X, x Y, and C = lim sup Cn. 
Then B =Q\C is obviously an event and we want to show that 


P(C) = 0. By Lemma 6.2.1 it suffices to verify the convergence of the 
series 2;-., P(C,). We have C, = (|X,| 2 n] and thus 


oo 


(yc (teal 


j=n+1 
Hence we obtain 
SPs) O » G = Dat) 
n=1 n=1j=n+l (6.4.2) 
- Sa -»f,ds » INEPCORSM NEPOS 
Since f|r|u(dx) = E(|X,|) < +, n = 1, 2, . . . , this series is indeed 


convergent. J 


Under the hypotheses of Theorem 6.4.2, the strong law of large numbers 
tells us that (n^! Z2, X;) converges almost surely to a constant o, indeed, 
a = E(X,). In this formulation the expected value is no longer used 
explieitly and we might try to eliminate the hypothesis of integrability 
of X,. But this is not possible, as is shown by the next result: 


6.4.3. Theorem. Suppose for an independent sequence (X,).cw 
of real identically distributed random variables, that the sequence 
((1/n) Zt, X;) converges almost surely to a real random variable Y as 
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n — œ. Then every X, is integrable and Y is almost surely constant, that 
is, 


Y = E(X,) almost surely ncc utu) (6.4.3) 


Proof. The assumed almost sure convergence of the sequence 
1 
Y-21)X (EN) 


implies the almost sure convergence of the sequence 


n—1 


ye (n c N) 


to 0. Thus |X,| = n for infinitely many n only occurs with probability 0. 
If C, again denotes the event {|X,| = n], then P (lim sup C,) = 0. Since 


when (X,,) is an independent sequence, the sequence (C,) is also independ- 
ent, the series 2; , P(C,) is convergent by the Borel-Cantelli Lemma. 
Nowif H;,j7 = 1,2, . . . , is again defined as in the beginning of the proof 
of Theorem 6.4.2, then by (6.4.2) we have for the common distribution 
p of the X,: 


y G- peu - Y Ped. 
j=2 n=1 


The convergence of this series and 


Í |z|u(dx) = » ifs lac|u (dac) 


3 je (Hj) =1+ y (j — Du(H;) 
j=2 


j=1 


E(|X,) 


IIA 


imply the integrability of each of the X,. But then Theorem 6.4.2 can be 
applied, and thus Y, converges almost surely to the common expected 
value a of all the X,. Hence Y = a almost surely. J 


Example 


Let (Q,9(,P) be the probability space defined in Section 5.1, Example 3 
and let (An) be the independent sequence of events given there. We con- 
sider the independent sequence (X,) of the associated indicator variables 
X, = 14, Since P(A,) = 3, every X, has distribution 8? with p = 3, 
and thus, by Theorem 6.4.2, (X,) satisfies the strong law of large num- 
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bers. Therefore, P-almost surely, 


This result can be interpreted as follows: 
For every natural number g = 2, every number x € Q = [0,1[ has 
exactly one g-adic expansion 


n=1 
in which £, can take on the values 0, 1, . . . ,g — 1 and, for n sufficiently 
large, not all £, are equal to g — 1. For every e E (0,1, ...,9g “= 1j 
let S?^(x) denote the number of integers among? = 1, . . . , n such that 


£; = ein the g-adic expansion of x. We say that x € Q is g-normal if 


no 


1 1 
lim, — S2" (x) => for cach 4.0) Ls ss gu 
n g 


The number z is said to be absolutely normal if x is g-normal for all g 2 2. 
For g = 2 we obviously have 


SS) = y Xz) and — Si*(x) =n — S%?(a). 
i=l 
The strong law of large numbers says precisely that P-almost all numbers 
x E Q are 2-normal. It can be shown analogously that P-almost all num- 
bers x € Q are g-normal for arbitrary g = 2. Since the union of countably 
many null sets is again a null set, it finally follows that P-almost all 
x € Q are absolutely normal. 


PROBLEMS 


1. Let ^ be a real number, and let (Xn)nen be a corresponding independ- 
ent sequence of real random variables with the properties stated in 
part (b) of the problem of Section 5.4. Prove: 


V(Xn : : 
(a) : ) converges if and only if ^ < 4. 
n 
n=1 
(b) The sequence satisfies the strong law of large numbers for 
Nee: 


(c) For 2 1 the strong law fails. 
(Hint: Use Section 6.2, Problem 4.] (For the case 4 € ^ < 1see 
Section 9.2, Problem 6.) 
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2. Let (X,).ew and (Y,),ew be sequences of real, integrable random 
variables such that E(X,) = E(Y,) for all n. Prove: If (X,) satisfies 
the strong law of large numbers and if 


Pix, # Yat Ed de, 


f 
1 


ibus 


then also (Y,) satisfies the strong law. 

3. Let (Xn)nen be an independent sequence of real random variables 
with the properties mentioned in part (a) of the Problem of Section 
5.4. Prove: The sequence satisfies the strong law of large numbers 
but Z7; ([V(X,)/n?] diverges. 

[Hint: Introduce Y, = sup (—2,inf (X,,2)) and use Problem 2.] 
[This proves that Kolmogorov's criterion (6.4.1) is sufficient but not 
necessary for the strong law.] 

4. Let (on)nen be a sequence of positive real numbers such that 
274 (c;/n?) diverges. Prove the existence of an independent sequence 
(X,) of integrable random variables with variances V(X,) — 
c? (n EN) for which the strong law fails. [Hint: Introduce an = 
max (on,n) and 8, = min (e,,n); define X, in such a way that 


PIX, = an} = P(X, = —an} = AGE P{X, = 0} =1— Ey 


and apply Section 6.2, Problem 4.] 


6.5 WEAK LAW OF LARGE NUMBERS 


We now turn back for a moment to the sequence of Bernoulli trials 
considered in Section 6.1 and the corresponding sequence (Xn) of inde- 
pendent random variables describing them. Since S, = Z7, X; has dis- 
tribution 87, by (4.4.2) and (4.4.4) we have 


E(n1$,) = p, 
and 
V(n-1S,) = n-V(S,)'— n-pg. 


For every real e > 0 the Chebyshev inequality (6.3.11) yields 


and therefore PÍí|(1/n)S, — p| = e] —> 0 as n — œ. This fact, already 
known to Jacob Bernoulli, again shows that ((1/n)S,) approaches p 
“with high probability." What is new here is that the almost sure con- 
vergence is replaced by the stochastic convergence (with respect to P) 
introduced in Definition 2.11.2. We now define: 
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6.5.1. Definition. A sequence (X,),cw of integrable real random 
variables on a probability space (Q,9G, P) is said to satisfy the weak law 
of large numbers if 


P-lim = D (X; — E(X3) = 0.5 (6.5.1) 
i-1 


The designation ‘‘weak law" is justified by Theorem 2.11.4, according to 
which stochastic convergence can always be concluded from the almost 
sure convergence of a sequence of random variables. The validity of the 
strong law of large numbers thus always implies the validity of the weak 
law of large numbers. Hence, in particular, the above result of J. Ber- 
noulli is a corollary to the theorem of E. Borel of Section 6.1. 


Because of these relationships, 1t should not be surprising that relatively 
weak hypotheses are sufficient for the validity of the weak law of large 
numbers. As an example, we have the following result: 


6.5.2. Theorem. If 

3 1 

lint V(X) = 0, (6.5.2) 
n 


for a sequence (Xn)nen of real, integrable, pairwise uncorrelated random 
variables, then the sequence satisfies the weak law of large numbers. 


Proof. If we define S, = Xı — E(X1) + - - ~+ X, — E(X,), then 
by Theorem 5.3.3 


n 


vis.) = Y vq, 


i-1 


n n 


4 = 


and thus 


for all » € N. The assertion. then follows from the Chebyshev 
inequality. J 


Condition (6.5.2) is satisfied, for example, if the sequence (V(X,)) of 
variances is bounded. This is true, for instance, if the random variables 


5 We thus have stochastic convergence to the constant random variable 0. 
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X, are all identically distributed and square integrable. By reasoning 
similar to that in the proof of Theorem 6.4.2 we can eliminate the finite- 
ness of the variances and thus obtain a theorem of A. J. Khinchin (see, 
for example, Rényi [19]). 


PROBLEMS 


1. Let (Xn)new be an independent sequence of centered integrable real 
random variables on (Q,%,P). Prove: If (X,) satisfies the weak law 
of large numbers, then the sequence ((1/n)X,) converges stochas- 
tically to 0, that is, 


1 
n 
for all e > 0. Compare this result with Section 6.2, Problem 4. 


2. Apply Problem 1 to situation (c) of Section 6.4, Problem 1 and prove 
that even the weak law of large numbers fails. 
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MEASURES ON 
TOPOLOGICAL SPACES 


To develop probability theory further, we need new properties of finite 
Borel measures on R”. Here we shall use only such properties of R? which 
follow, on the one hand, from the existence of a countable base and, on 
the other hand, from the local compactness and completeness of R? with 
respect to the Euclidean metric. Therefore, we carry out our investiga- 
tions both for locally compact spaces with countable base and the more 
general class of Polish spaces. At the same time we shall get new insight 
into the meaning of the concept of integral. Finally a concept of con- 
vergence particularly suitable to Borel measures will lead us back to 
probabilistic considerations. 


7.1 THE DANIELL-STONE THEOREM 


Suppose we are given a set Q and a set F of real functions on Q. We 
assume that F is a vector space (over R), that is, along with u, v € F and 
a, B € R, the function au + 8v defined on Q by 


w — au(w) + Bv(w) 


also lies in F. Suppose further that when § contains u, it also contains the 
function |u| defined on Q by 
w > |u(w)|. 


Then with any two functions u, v € $, their upper envelope sup (u,v) 
and lower envelope inf (u,v) also lie in 5; indeed, 


sup (u,v) = $(u +v + |u — vl) 
inf (uv) = —sup (—u,—v) = à(u +v — |u — vl). 
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Then, in particular, every function u C F is the difference of nonnegative 
functions in $, for example, the positive part 


w* = sup (T0) 
and the negative part 
ur E eer 
of u. In the sense of lattice theory, 9 is thus a vector lattice (or Riesz space). 
Let €, denote the set of functions u € F satisfying u = 0. If F also satis- 
fies the following condition (7.1.1) of M. H. Stone [43], then we call F a 
Stone vector lattice (of real functions on Q). 'The condition is 


inf (u,1) € F, for all v € S. TED 


In particular, it is satisfied when the constant function 1 (and hence every 
constant real function on Q), lies in 9. 
If we note the equality 


1 
inf (uja) = a@ inf € ul) 
Q 


which holds for all « > 0, then it follows that 
inf (u,a) € F, for all u € § and alla € R, (12152) 


in every Stone vector lattice. 


Examples 


1. Let Q9 be a set and 9 a ring of sets in Q. Then the set € = F(R) of all 


linear combinations 
n 


(I c » ala, 
i-l 
of indicator functions 14, of sets 4; E R (n E N, an... , an € R)isa 
Stone vector lattice. To see this, we need only take the sets Ai, . . . , An 
as pairwise disjoint. That this is always possible is shown by the following 
reasoning: Each set A; is the union of the k = 2”~! pairwise disjoint sets 


By ..., B, € 8, where each B; is of the form Cii - - - (1C, with 
C,=A,or= 04,60 42: »=1, .. . 4 n) and with C; = A. We then 
have 

Ly S i ae Rte coe ae 


We note that the statements 1 € F and Q € R are equivalent. 


2. LetQ = [0,1] and let F be the set of all continuous real functions u on | 
Q which are differentiable (on the right) at the point 0 and satisfy u(0) = 0. 
F is a Stone vector lattice. 
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3. For every measure space (Q,9(64) and every number p € [1,4- œ], 
£?(u) is a Stone vector lattice. It follows from Definition 2.6.3 and (2.6.6) 
that when the vector space £"(u) contains a function u, it also contains 
|u|. Since [inf (4,1)| € |u|, we have that inf (u,1) is an L?-function 
(1 € p < +œ) whenever u is, and p- -almost everywhere bounded (p = 

+ œ) whenever u is. The constant function 1 lies in £?(u) only for a finite 
measure u, provided 1 € p < +o. 

Of particular interest to us will be the Stone vector lattice €!(u). Accord- 
ing to Section 2.4, f —^ ff du is a positive linear form on £!(u) which, 
according to the theorem on monotone convergence, is an abstract integral 
in the sense of the following definition. 


7.1.1. Definition. An abstract integral is a linear form J (that is, 
linear mapping 7: F — R) defined on a Stone vector lattice S of real func- 
tions with the following two properties: 


I is positive, that is, J(w) = 0, for all u E€ 9,;! (7.1.3) 


for every isotone sequence (Un)nen in F4 with upper envelope sup Up in F, 


I (sup un) = sup I (un). (7.1.4) 
nEN nEN 
Taken with the other properties of J, the “continuity property” (7.1.4) is 
equivalent to 
For every antitone sequence (uUn)ney in F with inf w, = 0, 
nEN 


lim I (un) = 0. (2:1:45) 


In fact, if (un) is an isotone sequence in F4} with u = sup wu, € F, then the 
sequence (u — un) is antitone and inf (u — un) = 0. Conversely, for every 
antitone sequence (va) in € with inf v, = 0, the sequence (vi — Vn) is 
isotone, vı — v, € S, for all n € N and vı = sup (v1 — va). 


Justification of the designation "abstract integral" is given by the 
Daniell-Stone Theorem below, which shows in a very precise sense that 
every abstract integral is a “concrete” one. 

The following concept serves as preparation for the proof and several 
consequences of this theorem: 


7.1.2. Definition. Let F be a Stone vector lattice of real functions 
on a set Q. A set G C Q is said to be S-open if there exists an isotone 


1 The isotonicity of I is equivalent to this, as follows immediately from I (v) — I(u) = 
I(v — u)(u, v € F). 
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sequence (Un)nen In F+ such that 
lg = SUP Un. 


Let © = G(s) denote the system of all $-open sets. 


If (similarly to Section 2.3) we let $7 denote the set of all numerical 
functions f = 0 on Q which are upper envelopes of isotone sequences (un) 
in $4, then we can also formulate Definition 7.1.5 as 


© = (Gc PQ): le € FF}. (7.1.5) 


7.1.3. Lemma. The system (9 of S-open sets has the following 
properties: 


(1) {f> a} € G for every f € 97 and every real number o = 0. 


(2) @is an /^-stable generator of the smallest c-algebra (F) in Q with 
respect to which all functions u € $ are measurable. 


oO 


(3) Whenever (G,) nen is a sequence of S-open sets, UJ G, is also $-open. 


n=1 


Proof. (1) Forf C 97 there exists an isotone sequence (un) in F+ such 
that f = sup un. As a consequence of the Stone condition, each of the 
functions 


n(u, — a)* (noue 2I uM) 


Un = n(u, — inf(u,,a)) 


lies in F4. Obviously lim v,(v) = 0 for all w € (f € a} and lim (o) = 


"n 0 n— o 
4 for all o € {f > a}; further, the sequence (v,) is isotone But then 
Wn = inf (v1), n = 1, 2, . . . , is an isotone sequence in 9, satisfying 


Sup Wn = l(j54j, that is, (f > a} E ©. 


(2) Every function f € 97 is a pointwise limit of a sequence in 9, and 
thus is itself 2(*)-measurable. By (7.1.5) it now follows that © C A(S), 
that is, A(G) C A(S). By (1), every function from 97, and in particular 
from F4, is 9((8)-measurable. Since every u € 9 is the difference of func- 
tions from $,, the ?[((G)-measurability of all u € F now follows, that is, 
W(F) C A(G) and hence A(G) = A(F). Finally, © is M-stable since lenu = 
inf (16,15) holds for arbitrary sets G, H C Q and 97 obviously contains 
the infinum of any two functions f, g € 97. 
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(3) Whenever the sequence (fa) lies in 9*, sup f, also lies in 97. This is 
proved similarly to the analogous assertion in Theorem 2.3.4. We note 
only that F+ and hence also 97 contains sup (f,g) as an element whenever 
it contains f and g. Hence we may assume that the sequence (fn) is isotone. 
Therefore, whenever © contains a sequence (G,), it also contains VJ G,, 
since lUe, = sup le, 


Remark. In general Q is not $-open. This is shown by Example 1 for 
the ring R = {Ø}. We then have © = {Ø}. 


Finally we come to: 


7.1.4. Theorem (P. Daniell-M. H. Stone). Let Q be a set, F a 
Stone vector lattice of real functions on Q, and J an abstract integral 
on F. Then there exists exactly one measure y on the c-algebra (F) with 
the following properties: 


FC Lu), (7.1.6) 

I(u) = fudu, forall u € 5S; hed) 
inf u(G), if A C G for some G E G, 

u(A) = 4 SE (7.1.8) 
+c, if A C G for no G E G, 


for all A E 9((9) 
Proof. We proceed in several steps: 


1. For every isotone sequence (un) € 9, and every v € F4, 
" 9 $supu,- I (v) € sup I(un). (7.1.9)? 

If we set v, = inf (v,u,), then (v,) is an isotone sequence in F4 such that 
sup v, = inf (v,sup ua) = v € Fy. By (7.1.4) it then follows that 7(v) = 
sup Z(v,). Taking into account the inequality vn € un which holds for all 
n, we have I(v,) € I(un) and hence J(v) € sup I(un). 
2. For any two isotone sequences (un) and (vp) in F4, 

SUP Un = sup v, => sup I(un) = sup I (va). (7.1.10) 


This is obtained from (7.1.9) in the same way as Corollary 2.3.2 was 
obtained from Theorem 2.3.1. 


Thus we can define a numerical function J* on 97 as follows: For f € 97 
let (un) be an isotone sequence in 9$, with f = sup un. By (7.1.10), 


2 The reader should note the analogy with Theorem 2.3.1. 
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sup I(un) is independent of the particular choice of (un). Thus, we can set 


I*(f) = sup I(u,). (7.1.11) 

Then, obviously, 
i T's T. for all u € F4. (7.1.12) 
We obtain the most important properties of 7* similarly to Section 2.3: 
The isotonicity of 7* follows from (7.1.9) and the additivity and positive- 
homogeneity from (7.1.11). As in the proof of Theorem 2.3.4, we show 

that 

I*(sup f.) = sup I*(f,) UPS) 


. . * 
for every isotone sequence (fa) in 97. 


3. Now we define an outer measure u* on Q. Let 


p*(G) =I"); for G € G, (1p) 
inf ,*(G), if Q CG for some G E G, 
u*(Q) co 


+o, if Q C G for no G € G, Els 


for Q € BQ). 


The isotonicity of 7* shows the compatibility of Definitions (7.1.14) 
and (7.1.15) as well as the isotonicity of u*. Moreover, 0 = lg and 
Ø € ©, so that u*(@) = 0, implying u* = 0. So we still have to show 
that 


nr eub Qn) = > u*(Q.) 
n=1 nel 


for an arbitrary sequence (Q,) in P(Q). Since the union of every sequence 
of S-open sets is $-open, it obviously suffices to deal with the case in which 
each Q, is $-open. But then, using the properties of J* already established 
in 2, we conclude lug, € È 1g, and thus 


u* (U 0.) =i sn (X lo,) xci (sup E lo.) 


ncN 
"mr Qe) neg ) noo 
= Ue SAND 
n=l n=1 


4. The outer measure u* can also be defined by the equality 
inf I*(f), if lo X f for some f E Sf 


u*(Q) = AE; (7.1.16) 
T, if le < f for no f E s? 
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for arbitrary sets Q C P(Q). Since, by definition, lẹ € $* for all $-open 
sets, it suffices to prove that lọ < f with f C 97 implies the existence of an 
$-open set GD Q as well as u*(Q) < I*(f). But for every real number 
a € ]0,1[ the set Ge = (f > a} is S-open and satisfies Ge D Q and 
alg, S f. Therefore, u*(Q) x u*(G,.) = MU) 3 € (1/a)I*(f) for all 0 < 
a « 1. This implies u*(Q) x I*(f). 


5. By Theorem 1.5.4, the system %* of all u*-measurable sets, that is, 
of all sets A C Q satisfying 


ŠQ) z u*(Q Y A) -- u*(QN A), forallQ c GQ), (7.1.17) 


is a o-algebra and resty+ u* is a measure. We shall show that 9(* contains 
every S-open set A. Because of (7.1.15) and the isotonicity of u*, we need 
only verify that 


u*(G) 2 u*(G O A) + w*(G \ A) 


for S-open sets G with u*(G) < +œ. Since CGO A € © we can also 
assume that A C G, and in particular n, *(A) < +œ. For G and A there 
exist isotone sequences (un) and (va) in F} such that lg = sup wu, and 
1, = sup», Since l4 S le, 


fn = le — v4 = sup (Ur — v4)* 
kEN 


lies in $7, and I*(f,) = u*(@) — I(vn). The sequence (fn) is antitone. 
Using (7.1.16), we thus obtain 


B*(G NA) = = p*(H) = ms TT nBIt05) 
aH com 
= lim I*(fn) = &*(G) — u*(A) 
and hence 


H(A) + 1*(GN A) = pG Y A) + »*(GN A) x uG): 


Thus © C A*, and hence 3((5) = A(G) C 9(* has been verified because of 
Lemma 7.1.3. Then, further, the restriction u of u* to A(F) is a measure 
which, according to (7.1.15), satisfies the condition (7.1.8). 


6. u also satisfies the conditions (7.1.6) and (7.1.7). For this, let u € F4. 
We need to show that I(u) = T udu, whence follows the u-integrability 
of u because 0 € J(u) < +. 

For every natural number n and = 1, 2, . . . , n2", the set 
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is S-open by Lemma 7.1.3. Therefore, 


n2” 
1 
$-1 
lies in $7. The sets 
Am fu> Zia fus S41, ea nar — 1 


and 
Avon = Tum) 


n2^ 


are pairwise disjoint and elements of A(S). Thus Gin = V Aj, implies 


j=1 
n27 
1 
fa = » Qn Luce 
1-1 


The same reasoning as in the proof of Theorem 2.3.6 then shows the 
isotonicity of (fa) and the equality u = sup fn. By (7.1.13) and the 
Theorem of Monotone Convergence we need only show that I*(f,) = 
ike du for every n = 1,2, . . . . But this follows directly from (7.1.18) 
and the definition of u, and in particular from (7.1.14). 

For an arbitrary function u € $ we obtain its u-integrability and (7.1.7) 
by decomposing u into positive and negative parts. 
7. The uniqueness of the measure y follows from (7.1.8) provided the 
values u(G) for sets G € © are uniquely determined by the remaining 
conditions. But this can be seen as follows: For G € © there exists an 
isotone sequence (un) in F4 with le = sup un. Therefore for every measure 
u on 3(($) with the property (7.1.7), 


lg, = 275 14, and hence 


BG) = fille du = sup fun du = sup Llun) = I* (le). 


Therefore, u(G) is uniquely determined. 
Thus all parts of the theorem have been proved. J 


7.1.5. Corollary 1. Under the assumptions of Theorem 7.1.4, for 
every real number p satisfying 1 € p < +œ, FA £?(u) is dense in 
£?(u) with respect to convergence in pth mean.? 


3 [n particular § is dense in £!(u) by (7.1.6). 
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Proof. For every function f € £?(u) and every e > 0 we have to prove 
the existence of an L?-function u € $ such that 


N,(f — v) = (flf — u |? du)? s e. 


We solve this problem by simplifying the function f step by step. Since 
|f|, ft, and f- are also in £”(u) whenever f is, and N, is a seminorm on 
£?(u), we can in addition assume f = 0. Then there is an isotone sequence 
(fp) of 3((S)-elementary functions such that f = sup fr. Since 0 < f, X f, 
all fn lie in £?(u). Therefore lim N,(f — fa) = 0 by the Theorem of Domi- 


nated Convergence. Thus we can also assume that f is an 2(5)-elementary 
function and, after another application of the seminorm property of N;, 
that it is an indicator function 14 of aset A E 9((9) with w(A) = [N,(14)]? 
< +. But then (7.1.8) applies. There exists an $-open set G containing 
A such that 


(u(G) — «(A)» = Ne - £ 2 


Since G is $-open, there is an isotone sequence (un) in F4 with lg = sup Un. 
It then follows that un € £?(u) for all n and lim N,(1a — un) = 0. Thus 


no 


there is an no for which 


N,(1e¢ TA Un,) S 3 
Hence 
NAS = Uses Nf ele) + Nelle — us). S.6 


Thus u = Un, yields the desired result. J 


7.1.6. Corollary 2. If there exists an isotone sequence (un) in F4 
such that 

sup Un(w) > 0, for all w € Q; (7.1.19) 

sup I(un) < +, (7.1.20) 


then the measure u of Theorem 7.1.4 is o-finite and is uniquely determined 
by the properties (7.1.6) and (7.1.7) alone. 


Proof. The function f = sup wu, lies in 97 and is strictly positive. 


Therefore, 2 = \_) G, for the sets G, = (f > 1/k}, k E N, which are 
k=1 
$-open according to Lemma 7.1.3. Since le, € kf, then 


u(G,) € k f f du — ksup fun du =k Sup ux) < + se, 
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for every measure u on (F) satisfying the conditions (7.1.6) and (7.1.7). 
This proves the c-finiteness of u. The rest of the assertion follows from 
the Uniqueness Theorem 1.5.5 since © is an -stable generator of 9((9) 
on which the values &(G) are uniquely determined by the conditions 
(12120) and C7 145; 2) 


Note that the conditions of Corollary 7.1.6 are satisfied if the constant 
function 1 lies in $. Then we can choose un = 1 for alln —1,2,.... 


Examples 


4. Let Q be a set, R a ring of sets in Q, and y a finite content on R. For 
every function u = 2%, 14, from the Stone vector lattice F = F(R) 
of Example 1, define 


L(u) = ) auld). 
1-1 
Reasoning analogous to that in Section 2.2 (via normal representations) 
shows that Z,(u) is independent of the particular representation of u and 
hence 7,: F — R is a positive linear form on F. We now have: 


I is an abstract integral if and only if u is c-additive, that is, is a pre- 
measure. 


The second half of the assertion is obvious since for every sequence 
(An) in R with A, | Ø, the sequence (14,) is antitone and inf 14, = 0. 
Thus y is c-additive if J is an abstract integral. For the converse we reason 
as in the proof of Theorem 2.3.1. 

Now we obviously have 9(($) = A(R). Thus. Theorem 7.1.4 says: If u 
is a premeasure, there is at least one measure ~@ on 9((90) with J,(u) = 
Ju dà for all u € S, that is, with 


g(A) = u(A) 


for all A € 9t. Thus we obtain once more the familiar theorem whereby 
u can be extended to a measure fi on Y(R). z is uniquely determined if we 
also require (7.1.8). The reader should verify that we obtain the measure 
constructed in Theorem 1.5.2. The $-open sets here are indeed all sets of 


the form VJ A, where (A,) is a sequence in R. 


n=l 


5. Let F be the Stone vector lattice of real functions on [0,1] defined in 
Example 2. Now 


I(u) = w(0) = lim 462 (uc) 


z>0 
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defines a positive linear form on $. This is not an abstract integral. Indeed, 
consider the functions u,(x) = inf (z,1/n), x € [0,1], n 21,2, .. .. 
Then (un) is an antitone sequence in F, with inf un = 0, but with I(un) = 1 
for all n. 


Remark. Example 4 above can be constructed in such a way that start- 
ing with a premeasure u we immediately obtain the theory of the integral 
relative to the measure g above without having defined the latter in 
advance. To do this we start with the concept of abstract integral and use 
the Daniell-Stone Theorem. Then the abstract integral, at first defined 
only on the system $ of “elementary functions," is extended to the more 
comprehensive domain $!'(g) with preservation of its properties. This 
extension is then the “concrete” integral. For details see Aumann [23], 
where the presentation corresponds in part to the original work of Daniell 
[29] and Stone [43]. 


Condition (7.1.8) is a regularity condition. It will appear in the next 
section in à sharper form. The refinement will be, primarily, that the 
¥-open sets coincide with the open subsets of Q for certain Stone vector 
lattices F of continuous real functions on a topological space Q. 


PROBLEMS 


1. Prove: A set G C Qis $-open if and only if it is the union of a sequence 
of sets of the form fu > 0} where u E F4. 

2. Let 9 be a Stone vector lattice of real functions on a set Q. Denote 
by Şo the set of all bounded functions g € F having the following 
property: There exists a function h € F4 (depending on g) such that 
h(w) < 1 implies g(w) = 0 (w € Q). Prove: 

(a) Fois a Stone vector lattice. 

(b) (f — a)t+ € Fo for all bounded functions f € § and all a > 0. 

(c) o-open sets and $-open sets coincide. [Hint: Study the proof of 
7.1.3, (1) and use Problem 1.] 

3. Let 9t be a ring in a set Q and let F be the Stone vector lattice F(M) of 

Example 1. Prove: 

(a) lu> a} € R for all v € F, and alla > 0. 

(b) A set is S-open if and only if it is the union of a sequence of 
sets in K. 

4. Let (Q,95u) be a o-finite measure space. Denote by F the vector 
space of all functions u = 27, aila, where each A; € % has finite 
u-measure (o; € R; n E N). Prove: $ is dense in £?(u) with respect 
to convergence in pth mean (1 € p < +). 
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7.0 BAIRE AND BOREL SETS AND MEASURES 


Let E be a topological space, © the system of its open sets defining the 
topology, and € = e(E) = e(E,R) [c^ = e*(E) = e*'(E,R)] the Stone 
vector lattice of all continuous real [continuous real bounded] functions 
defined on E. Since 1 € Cè C. e, the Stone condition is trivially satisfied. 


7.2.1. Definition. The c-algebra A(O) generated in E by © is called 
the c-algebra 8 = $3(E) of Borel sets in E. The smallest c-algebra A(C) 
in E with respect to which all functions f € €(£) are 9((6)-38!-measur- 
able is called the c-algebra Bo = Bo(£) of Baire sets in E. 


Since the closed sets are the complements of open ones, 8(£) is also 
generated by the system of closed sets of E. 

Every function f € € is $S8(E)-measurable since {f > a} is open for 
every a € R. Consequently, 


B(E) C BE), (7.2.1) 


that is, every Baire set in Æ is Borel. 
Every function f € € is a pointwise limit of a sequence (fa) in Cè, say 
the sequence f, = inf (sup (f, —n),n). Therefore, 


$8,(E) = A(e(#)). (7.2.2) 
Examples 
1. By Theorem 1.6.4 we have 
GR) = BP eri o eink te) (7.2.3) 


In Corollary 7.2.4 we shall also obtain Y(R?) = Y(R»). 


2. Let E be a discrete space, that is, S = P(E). Then all real functions 
on E are continuous. Thus $8,(E) = $8(E) = P(E). The system $ of 
compact subsets of E is in general not a generator of $8,(E) = $S(E). 
Obviously $' consists of all finite subsets of E; therefore, A(R) is the 
c-algebra of all sets A C E for which either A or (CA is countable. Thus 
A(R) = 889(E) if and only if E is countable. 


3. Let Q be a subspace of a topological space E. Then 8(Q) = Q OAO $S(E), 
that is, B(Q) is the trace of $8(E) in Q. In fact, {Q O G: G € O} is the 
system of open sets of Q and thus a generator of 8(Q). Since Q O $S8(E) 
is a o-algebra in Q containing this generator, we have 


BQ) C Q BE). 
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The system {A C E: QONA A € B(Q)} is obviously a c-algebra in E, 
which contains the generator of S8(E) consisting of all open sets. Therefore 
we also have Q AO S8(E) C $(Q).* 

If Q itself is Borel in E, then 8(Q) consists of all Borel subsets of E 
which are contained in Q. 


A subset A of a topological space E is usually called an F,-set (G;-set) if it 
is the union (intersection) of a sequence of closed (open) sets. We call A a 
K,-set if it is the union of a sequence of compact sets. 


We say that E is normal (see Franz [32], p. 64) if the Hausdorff separa- 
tion axiom is satisfied? and if for any two disjoint closed sets Fo, F1 C E 
there exists a function f € @ satisfying 0 € f € 1, f(x) = Oforall z € Fo 
and f(x) = 1 for all x € Fı. Every metric and every compact space is 
normal. 


We shall now’connect these notions with the considerations of the 
preceding section. 


7.2.2. Lemma. For every subset G of a normal space Æ, the follow- 
ing properties are equivalent: 


(1) Gis €(E)-open. 

(2) Gis e*(E)-open. 

(3) Gis an open F,-set. 

(4) There exists a function f € G*(E) with f = 0 and G = (f > 0}. 


The implications (1) = (2) = (3) and (4) = (1) hold in any topological 
space E. 


Proof. (1) = (2): Every u € € with 1 S v S lc is bounded. 


(2) = (3): There is an isotone sequence (un) in C? such that 1a = sup Un. 
Then 


PRES) heey 


n=1 


4 More generally, this reasoning tells us the following: Let (Q,%) be a measurable 
space, Œ a generator of A, ~ C 9, and © = (Q' Q E: E € GJ. Then Q' AY is gen- 
erated by ©’ (in 0’). 

5 The reader should note that the proofs below of assertions 7.2.2, 7.2.3, 7.2.5, 7.2.6 
for normal spaces do not refer to the Hausdorff separation axiom. 
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is open as the union of open sets and since 


it is an F,-set. 


(3) > (4): For G there is a sequence (Fn) of closed sets such that 


G = XJ F,. Since Fn and 6G are disjoint closed sets, due to the normal- 


n=1 
ity of E there is a function f, € €^ satisfying 0 € f, € 1, f»(x) = 0 for 
alla € CG and f,(x) = 1 for all x € F,. The series Z;.., (1/n?)f,() is 
uniformly convergent on E and thus defines a function f E C3. For this f 
we have G = (f > 0} by construction. 


(4) = (1): This follows directly from Lemma 7.1.3. J 


7.2.3. Corollary 1. The c-algebra of Baire sets of a normal space is 
generated by the system of open F,-sets as well as by the system of closed 
G;-sets. 


Proof. This follows from Lemmas 7.1.3 and 7.2.2. We have only to 
note that the complements of the open F.-sets are just the closed G;-sets 
of the space. J 


7.2.4. Corollary 2. For every metrizable space E, S89(E) = B(4). 


Proof. Every closed set F in E is a G;-set. For if d denotes a metric 
defining the topology of E, then 


1 
Gn = (x E E:d(z,F) < 
n 


is open for every n € N and F = (X Gn. But the closed sets in E generate 
n=1 
BF). J 


Every finite measure defined on Bo(F), [$(E)] is called a finite Baire 
(Borel) measure on the topological space E. If we note that €^ contains 
the constant function 1, then Theorem 7.1.4 together with Corollary 7.1.6 
gives us: 


7.2.5. Theorem. If E is a topological space, then for every abstract 
integral J on C*(E), there exists exactly one measure y defined on $8,(E) 
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such that C(E) C £1(u) and 
I(f) = [f du 


for all f € G*(E). u is a finite Baire measure on EF. 


Conversely, for every finite Baire measure yu on a topological space E, 
the mapping f — ff du is an abstract integral on G^(E). Thus we can 
consider abstract integrals on €^(E) and finite Baire measures on E as one 
and the same mathematical object for every topological space E. Hence 
we have further: 


7.2.6. Corollary. For every finite Baire measure u on a topological 
space E and every Baire set A C. E, 


u(A) = inf (u(G): A C G, G open, G € 334(E)], (7.2.4) 
u(A) = sup (u(F): F C A, F closed, F € $,4(£E)]. (72:5) 


Proof. By Corollary 7.1.6, (7.2.4) follows from (7.1.8). If we apply 
(7.2.4) to €A in place of A, then (7.2.5) follows because 


w(A) = &(E) — w( CA). J 


Note that this corollary takes on a particularly simple form for metriza- 
ble spaces E since then $84(/) = $8(E) and (by the proof of Corollary 
7.2.4) every open subset of E is an F,-set. 


Examples 


4. Let E be a topological space and e, the finite Baire measure on E 
defined by the unit mass in a € E. Then e, is the only finite Baire measure 
u on E satisfying 

ff du = f(a), 
for all f E e*(E). : 


5. Let E = [a,b] be a compact interval in R. The L-B-measure A on E 
is the only finite Borel measure u on E such that 


ffdu = MO dx for all f € e(E). 


Remark. In general 8&(E) # B(Z). This will be shown by Example 5 
in Section 7.5, where E will be even compact (or by the following 
Problem). 
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PROBLEM 


Let E be an infinite set with the so-called cofinal topology, that is, a 

set O C E is open if and only if £O is finite or E. Prove: 

(a) Two nonempty open sets of E cannot be disjoint. In particular, Æ 
is not a Hausdorff space if it contains more than one point. 

(b) C(E) only consists of the constant real functions on £. 

(c) BE) = (GE). 

(d) $3(E) is the c-algebra (of Section 1.1, Problem 2) consisting of all 
sets A C E for which A or A is countable. Hence, $8,(E) = $8(E) 
if E contains more than one point. 


7.3 REGULARITY OF FINITE BOREL 
MEASURES ON POLISH SPACES 


A refined form of property (7.2.5) of a finite Baire measure on a topo- 
logical space can be established in two special cases, one of which we 
shall discuss now. 


7.3.1. Definition. Let E be a topological space and u a measure 
defined on the c-algebra 8(E) of Borel sets of E. We say that u is regular 
if for every set A € B(F), 


u(A) — inf (u(G): A C G, G open] (outer regularity), (7.3.1) 
u(A) = sup {u(K): K C A, K compact} (inner regularity). (7.8.2) 


A measure y defined on Bo(£) is said to be regular if for every A € B(£), 


w(A) = inf {u(@): A C G, G open, G € B(E)}, S) 
u(A) = sup {u(K): K C A, K compact, K E B(E)}. (1.8.27) 


Properties (7.3.1) and (7.3.2’) are also called outer and inner regularity 
of u, respectively. 


Every finite Baire measure on a topological space has the property of 
outer regularity by (7.2.4). By Corollary 7.2.6 every finite Baire measure 
on a compact space is regular since closed subsets coincide with compact 
ones in compact spaces. 


7.3.2. Definition. A topological space £ is called Polish® if there is a 
complete metric defining its topology and if E has a countable base. 


* The name, due to Bourbaki, recalls the achievements of Polish topologists during 
the developmental years of general topology. 


MEASURES ON TOPOLOGICAL SPACES 209 


Remember that a metric is said to be complete if the associated metric 
space is complete, that is, if the Cauchy convergence criterion holds in it. 
Countable base means the existence of countably many open sets such that 
every open set is the union of certain of these sets." 


Examples 


1. The Euclidean space R? is Polish (p = 1, 2, . . .). It suffices to con- 
sider the Euclidean metric on R?. 


2. The product E' X E" of two Polish spaces is Polish. If d’ and d" are 
complete metrics defining the topology of E’ and E", respectively, then 


d(x,y) = aay’) + d" (x",y") 


is a complete metric defining the topology of E' X E". Here we let x = 
(z',z") and y = (y',y") be points of E' X E". Further if $9 (8) is a 
countable base of E’ (E"), then (G' X G": G' ECG’, G" c QG"]) isa 
countable base of E' X E". 


3. Every closed subspace F of a Polish space E is Polish. It suffices to 
restrict à complete metric defining the topology of E to F. 


4. Every open subspace G of a Polish space E is Polish. 


Proof. We can assume G # E. By Examples 1 and 2, R X £ is Polish. 
In this product we consider the set F of all (x) CR X E with 
\: d(z,E \G) = 1. As usual, d(z,A) is the distance of a point x € E 
from a set A C E; the mapping x — d(z,A) is known to be continuous 
on E. Then, in particular, (A,x) — X: d(z, ENG) is a continuous real 
function on R X E; thus, F is a closed subset of R X E and hence a 
Polish space by Example 3. But F is obviously mapped onto G homeo- 
morphically by (A,x) — a. We need note only that ENG is closed and 
hence G = {x € E:d(z,ENG) 5 0j]. | 


5. More generally we have (see Bourbaki [26]) : A subspace A of a Polish 
space F is Polish if and only if A is a G;-set in E. Thus, for example, the 
space J of all irrational numbers taken as a subspace of R is Polish. We 
have 


cv N RA 


z rational 


? For metrizable spaces E this is equivalent to the existence of a countable, dense 
subset in E. 
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6. Every compact space E with countable basis is Polish. By a well-known 
theorem of Urysohn (see Franz [32], p. 100), E is metrizable. Because of 
the compactness, every metric defining the topology of E is complete. 


The significance of Polish spaces for measure theory rests primarily on 
the following result. Note that Polish spaces are metrizable and thus the 
Borel sets coincide with the Baire sets. 


7.3.3. Theorem. Every finite Borel measure u on a Polish space Z is 
regular. 


Proof. We need to prove only the inner regularity of u. By (7.2.5) it 
suffices to treat the case of a closed set A C. E. Since for compact K, the 
set A /1 K is compact and according to (1.3.5) 


HA) eru CAS URS): = i OR) SOR) S RUE) cU R Y. 


it remains to show that for every number e > 0 there is a compact set 
K C E such that 


p(E) — w(K) S e. (7.3.3) 
Now let d be a complete metric defining the topology of E and let (@n)nen 
be a dense sequence of points in Æ. Let K,(xr) denote the closed sphere 
with center x and radius r (relative to d). Then E = V J K,(z;) for every 
i-1 
r > 0 since every sphere K,(x) contains some q and thus x lies in K,(z;). 
Since u is continuous from below, 


k 
u(E) = lim u (UJ K.(z)). 
>% i=1 
Thus for every e > 0 and n € N there is a kn € N such that 


kn 
u (J Kus) & aE) — 5 


Kn 
Each of the sets B, = V J Kij,(z;) is closed, and we have 


i-1 


OUT OB) ZuE) x 


i=1 
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By a derivation similar to that in the proof of Theorem 1.4.4 this is shown 
by means of the inequality 


B(BiV Y + + Bau) 
dz a SUB) ent Ba veo Ba) Bug) 
= alB A © + O Ba) + u(Bsa) — nÆ) 


using induction. The set K = /\ B, is then closed, and we have 
n=l 


uK) = lim «(PA Bi) 2 E) - «a =O = « 


1-1 
n=1 


and thus (7.3.3). 
It remains only to show the compactness of K. But we have 


TB KMD oJ Kyjn(2x,) ; 


further, each of the sets K;,(x;) has diameter €2/n. Therefore, K is 
precompact? and closed, and thus compact due to the completeness of the 
metric space E. J 


PROBLEM 


Let E be a Polish space. Prove: The measures e, a € E, are the only 
probability measures on (E) which only attain the values 0 and 1. 
(Hint: Consider a probability measure u on 3B(E) which only attains 0 
and 1 as values. Prove that the system & of all compact sets K with 


u(K) = 1 is stable. Study ( Y K.] 
KER 


7.4 SOME PROPERTIES OF LOCALLY COMPACT SPACES 


A topological space E is said to be locally compact if it is Hausdorff and 
each of its points has at least one compact neighborhood. Examples of 
such spaces are the Euclidean space R», every manifold (that is, every 
locally Euclidean Hausdorff space), every discrete, and every compact 
space. 

By removing an arbitrary point from a compact space we obtain a 
locally compact space. This follows from the regularity of compact spaces. 
In this fashion we indeed obtain all locally compact spaces. If E is a locally 


? Precompact sets are also called totally bounded. See Franz [32], pp. 87 and 91. 


212 CONTINUATION OF MEASURE AND INTEGRATION THEORY 


compact space, © the system of its open sets and wo a point not lying in £, 
then we can define a topology on E’ = E UIU {wo} as follows: Let the system 
©’ of open sets of E' be the union of © with the system of all sets EN K 
provided K runs through all compact subsets of E. Thus, LE’ becomes a 
compact space and E an open subspace of E'.? If E is itself already 
compact, then wo is an isolated point of F’. If E is not compact, then E is 
dense in Z'. The space F” is called the (Alexandrov) one point compactifica- 
tion of E.'? We call wo the point at infinity of E. 

We build the further theory of locally compact spaces on the existence 
of F’. First we study a special class of functions in ©?(£). 


7.4.1. Definition. Let f: E — R be a real function on a topological 
space E. Then 


S, = if = 0} (7.4.1) 
is called the support of f. 


The complement of S; is thus the largest open set on which f is equal to 
zero. For a locally compact set E, let 


© = e(E) 


denote the set of all continuous real functions on E with compact support. 
A function f € €(E) thus lies in G*(E) if and only if f is zero on the 
complement of a suitable compact subset of E. We have €*(E) C G^(E) 
since f is bounded on the compact support S; and thus on all of E. 


For any »funetions ji, .. = 5 $4. €, eU, « «, « Jn) also Jes dn Cur 
ec e(R»)and'e(0, . 1.90) 2:0:mo- 1,2... ;.-). Indeed, 
Doras foi E PAS a THEN e 


Then in particular €* is a linear subspace of G*(E) which, when it contains 
f, 9, also contains the functions |f|, inf (1,f), and f: g. Therefore, €* is a 
Stone vector lattice of real functions on Æ. It is thus meaningful to deter- 
mine the G^-open subsets of E. For this we need: 


7.4.2. Lemma. In a locally compact space E let K be a compact set 
and U a neighborhood of K. Then there is a function f € G*(E) with the 
following properties: 0 € f € 1, f(x) = 1 for all z € K, and S; C U. In 
particular S; is a compact neighborhood of K. 


Proof. We can assume that U is open. In the one point compactifica- 
tion E' of E, the sets K and E'N U are then disjoint and closed. From 


? See Franz [32], p. 75. 
10 E/ is often defined only for noncompact E. 
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the normality of the compact space E' follows the existence of an open 
neighborhood V of K whose closure V is disjoint from E' NV U. Again the 
normality of E’ yields the existence of a function f' € G(E") such that 
Osf's1/f'(r)-lforallz € K andf'(y) = 0forally € E'N V. But 
then f = restg f’ yields the desired result: S; is a closed subset of V and 
thus is compact in E' and hence also in E because S; C E. Since K C 
(f > 0) C S; we see that S; is a compact neighborhood of Kin E. J 


In analogy to Lemma 7.2.2 we now obtain: 


7.4.3. Lemma. A subset G of a locally compact space E is €*(E)-open 
if and only if it is an open K,-set. 


Proof. The step '' (2) = (3)" in the proof of Lemma 7.2.2 (with 
Un € G2) for G^-open G shows that G is an open K,-set. Note that for 
f € C and a > 0, the set {|f| = o] is a closed subset of S; and is thus 
compact. Conversely, let G be an open K,-set and (K,) a sequence of 
compact sets with G = V Kn. By Lemma 7.4.2, for every n € N there 
is an f, € C° such that lx, € fa € le. Consequently, le = sup fn. But 
then un = sup (fi, . . . ,f;) yields an isotone sequence in C$ such that 
lg = sup Un. Hence Gis C-open. J 


Example 


1. Let E bea discrete space. Open K,-sets and thus €*(E)-open sets in £ 
are just the countable subsets of E. On the other hand, all subsets of E are 
QG(E)-open. (See Section 7.2, Example 2.) The €*(£)-open sets thus coincide 
exactly with the G(E)-open ones if E is countable. 


The question now arises under which condition the €*(/)-open sets 
coincide with the G(E)-open sets. Since 1 € €(E), the entire space E is 
always €(E)-open. Therefore, by Lemma 7.4.3, the condition that E be a 
K,-set is necessary for the identity of the two concepts of openness. 


7.4.4. Definition. A locally compact space E is said to be countable 
at infinity if E is a K,-set. 


The following theorem shows that this condition is also sufficient: 


7.4.5. Theorem. For every locally compact space countable at 
infinity, 

9((e«(E)) = Bol#). (7.4.2) 

More precisely, the €*(E)-open subsets coincide with the @(/)-open 
subsets. 
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Proof. By Lemma 7.1.3 the system Ge. [Ge] of C-open [C-open] sets 
is a generator of 9((e*) [B(E)]. C° C € implies Ge C Ge. Thus we still 
must show that G. C Ge. For this, let G € Ge and let (un) be an asso- 
ciated isotone sequence in €, such that 1e = sup un. Since E is a K,-set, 
then by Lemma 7.4.3 there is an isotone sequence (kn) in C4 with 1g = 
1 = sup kn. But then f, = uk, (n = 1,2, . . .) is an isotone sequence in 
O5 such that le = sup f, Thus G lies in Ge. | 


Examples 


2. The following locally compact spaces are countable at infinity: 
(a) Every compact space. 


(b) The Euclidean R?, p = 1, 2, . . . . The closed spheres with fixed 
center and radii r = 1, 2, . . . are all compact and cover R”. 


(c) Every locally compact space E with a countable base Q. Then Q* = 
(Q: Q € O, Q relatively compact} is a countable system of compact sets 
which covers Æ. The latter is obtained as follows: Every point x € E has 
a compact neighborhood V. Since © is a base of the topology, there is a 


Q € £i such that z € Q C V. Thus QC *andz€ Qc V. 


3. According to Example 1, a discrete space is countable at infinity if 
and only if it is countable. 


The terminology ‘‘countable at infinity" is finally justified by the 
following lemma and the associated remark: 

7.4.6. Lemma. Let E be a locally compact space countable at 
infinity. Then there is an isotone sequence (Ln)new of compact Baire sets, 
covering E, with the property 


Toa € Teco, (n ES IE 2. iE es (7.4.83) 


Every compact set K C E is contained in L, for n sufficiently large. 


Proof. By Lemma 7.4.3, E is a €--open set. Thus there is an isotone 
sequence (un) in C4 such that 1 = sup un. But then the sequence defined 
by 

1 
In = {um 2 >| Cin 12e Y 


n 
yields the desired result. First un is ?((6*)-measurable, and thus L, is a. 


11 A denotes the interior of a set A. 
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(closed) Baire set because ?((G*) = Bo; it is compact because Ln C Su, 
The isotonicity of (un) implies 


1 1 
D. E funn = z} E funn = | (M La 
and hence obviously La C L,,i. Finally, that every compact K C E is 
contained in one and thus in all L; for k sufficiently large follows from the 
remark that Q, = KNL, n-1,2,... is an antitone sequence of 
compact sets with empty intersection. Therefore, Q, must be empty for 
n sufficiently large. J 


In other words, this lemma says the following: A locally compact 
space # is countable at infinity if and only if the point at infinity wo has a 
countable fundamental system of neighborhoods in the one point com- 
pactification HL’. 

For an arbitrary locally compact space E, besides €*(E), the function 
space G?(E), to be defined in Definition 7.4.7, is also meaningful. For 
its definition we associate with every bounded real function f on E the 
norm of uniform convergence or sup-norm on E defined by 


[fll = sup [f()]. 
rcE 


Then the mapping (f,g) —> ||f — g|| gives €*(E), and more generally the 
vector space of bounded real functions on E, the structure of a metric 
space. Therefore we also speak of the metric of uniform convergence. A 
sequence (f,) of bounded real functions on E converges uniformly on Æ 
to a bounded real function f if and only if lim ||f, — f|| = 0. 


no 


7.4.7. Definition. A continuous real function f on a locally compact 
space E is said to vanish at infinity if it lies in the closure €? = €*(E) of 
G*(E) relative to the metric of uniform convergence on G*(£). 


Hence we have 
C%(h) = ee(E) C Ch). 


This teminology is justified by: 


7.4.8. Theorem. For every real function f on a locally compact 
space E, the following statements are equivalent: 


(a) f € CE): 


(b) f € G€(E) and for every real e > 0, {|f| 2 e] is compact. 
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(c) The function 


f= eu. 


is continuous on the one point compactification E' of E. 


Proof. (a) = (b). We need show only the second property. Let e > 0. 
By definition there is a g € €*(E) such that ||f — g|| € «/2. Due to the 
inequality |f(x)| — lg(x)| € |f(zx) — 9@)| S ||f — gll, we have 


(Iflazgc fa 2 <] cain 


and thus the set {|f| = e} is relatively compact. But by the continuity of 
f, the set is also closed and hence compact. 


(b) = (e): Condition (b) says that f is continuous and that for every e > 0 
there exists a compact set K C E with |f(x)| < e for all x € EN K. The 
assertion now follows since (E'N K: K compact C E} is a fundamental 
system of neighborhoods of wo in EL’. 


(c) => (a): For every e > 0 there is a compact set K C E with |f(x)| = 
f(x) — f’(wo)| < e for all x E EN K. By Lemma 7.4.2 there exists a 
g € €*(E) satisfying 0 € g € 1 and g(x) = 1 on K. But then fg lies in 
QG*(E) and we have 


[f(x)g(x) — fæ) = [f@)| —.0) s e 
for all x € E; hence ||fg — f|| € «e. This shows that f lies in e*(E). J 


From the definition or from property (c) it follows that ©°(£) is a linear 
subspace of G^(E), and it is, like G*(E), a Stone vector lattice. 
Finally, we obtain an analog of Lemma 7.4.2: 


7.4.9. Theorem. A locally compact space E is countable at infinity 
if and only if there exists a function f € G*(E) with f(z) > Oforallz € E. 


Proof. Let f be such a function. Then K, = {|f| 2 1/n], n = 1, 2, 


. , is an isotone sequence of compact sets such that E = U Ka. If, 
conversely, # is countable at infinity, then there is an isotone sequence 
(Kn) of compact sets which covers E. By Lemma 7.4.2 there then exists a 
sequence (un) in G*(E) satisfying 0 € un € 1 and w,(x) = 1 for all 
x E Kn (n =1, 2, .. .). Therefore, the series Z7? , (1/2")un(x) is uni- 
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formly convergent on E. Hence, 


1 
f= DE 
n=1 


yields the desired result. | 


PROBLEMS 


L 


Let E be a discrete space. Determine the c-algebra A(R) which is 
generated by the system $ of all compact subsets of E. Prove: A(R) = 
$88(E) (= Bo(#)) if and only if E is countable. 

Let E be a locally compact space. Prove: For § = €*(E), the space 
Fo defined in Section 7.1, Problem 2 coincides with e*(£). 

Let K be a compact and U an open subset of a locally compact space 
E such that K C U. Prove the existence of a compact G;-set K’ and 
an open K,-set U' in E such that K C U' C K' C U. 

Let E be a locally compact space countable at infinity. Prove: For 
every f € C$ (E), there exists an increasing sequence (un) in C$ (E) 
such that f = sup un. 

Let E' = E {wo} be the one point compactification of a locally 
compact space E. Describe the Borel sets of EL’ by means of the Borel 
sets of E. Show that this description fits into the following framework: 
Let (£,21) be a measurable space, wo Z E and EH = EU {wo}. 
Prove: The c-algebra Ñ% in E* generated by A and {wo} consists of 
all sets A’ C E^ such that A'O E E A. 

Let E' = EU {wo} be the one point compactification of a locally 
compact space E which is countable at infinity. Prove that, in the 
notation of Problem 5, we have: Bo(H’) = $89( E). 


7.5 BAIRE MEASURES ON LOCALLY COMPACT SPACES 


COUNTABLE AT INFINITY 


After the topological preparations of the preceding section, we now 


proceed to the study of measures on locally compact spaces E. Here we 
are concerned essentially with an application of the results of Section 7.1 
to the following special cases Q = E, $ = €*(E). We generally assume E 
to be countable at infinity, so that the c-algebra 3((e*(£)) is that of the 
Baire sets of E (see Theorem 7.4.5). The starting point of our investiga- 
tions is the noteworthy fact that every positive linear form is an abstract 
integral on the Stone vector lattice G*(£E). 
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7.5.1. Lemma. If E is a locally compact space and I is a positive 
linear form on G*(E), then for every compact set K C E there exists a 
real number ax = 0 such that 


[T(u)| S exl|ull, (7.5.1) 
for all u E e€«(E) with S, C K. 


Proof. By Lemma 7.4.2, for K there exists an f € C$, with f(x) = 
for all x € K. Therefore, for every u € €* with S, C K, we have |u| € 
l|ullf, whence I(u) € ||u||Z(f) and —I(u) = I(—w) < lulI(f), and thus 
II(u)| € |u|I(f). Hence ox = I(f) yields the desired result. .J 


7.5.2. Corollary. Every positive linear form J on €*(E) is an abstract 
integral. 


Proof. According to (7.1.4, we have to show: For every antitone 
sequence (un) in €5 with inf un = 0, we have inf I(un) = 0. We show the 
following sharper result: 


For every nonempty set 3 C €5 which is filtering to the left 


and satisfies inf t(x) = 0 for all x € E, we have inf I(t) = 0. 
tEd tEd 


(7.5.2)? 


We can assume without loss of generality that there exists a largest func- 
tion to in 3, since if necessary we can choose an arbitrary to in 3 and go 
over to Jo = {t E 9:t € to}. Since 3 is filtering to the left, we then have: 
Jo is filtering to the left and inf t(x) = 0 for alla € E. 

t€3o 


But if to is the largest element of 5, then S, C S, for all t € 3. Referring 
to Lemma 7.5.1, we therefore have to show only that 


inf |\t|| = 0. 
tES 


But this is just the statement of the so-called Theorem of Dini, which 


will be proved briefly here: For every x € E and e > 0, since inf t(x) = 0, 
t€5 

there is a ts € 3 with t,(x) < e. ts is continuous; thus, there is an open 

neighborhood U, of x such that t.(y) < for all y € Uz. Now Sn C 


\_) U., and the compactness of S,, implies the existence of finitely many 
z€Si 

pomis xp. m E Or, With the property S.-C UL, Sm 07 
Since J is filtering to the left, we find a t C 3 with t € inf (ta, . . . , t). 
But then t(y) < e for all y € S4, and thus for all y € E because 


12 Filtering to the left means: For any two functions tı, t2 € 5, there isa t € 5 
with ¢ < li and t x to. 
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S. C Se If we note that t = 0, then |t] € e and hence the assertion 
follows. J 


By the Daniell-Stone Theorem for J, there exists a measure u on 9((€») 
with C° C £!(u) and I(u) = fu dy for all u € €*. Since a u € €* with 
lx < u exists for every compact set K (see/Lemma 7.4.2), then u(K) < 


+ for every compact set K € 3((€»). This observation leads to the 
following definition: 


7.5.3. Definition. A Baire measure [Borel measure] on a locally 


compact space E is any measure u on (E) [B(L)] such that for all com- 
pact Baire [all compact] sets K, 


u(K) < +0, (7.5.3) 
Examples 


1. The L-B-measure ^? is a Borel measure on Rv. It is not finite 
(Gag bana T: 


2. Every finite measure on $84(E) [8(£)] is a Baire [Borel] measure on 
the locally compact space E. Therefore we can certainly retain the 
terminology "finite Baire [Borel] measure." 


3. The concepts of Baire and Borel measures coincide for a discrete space 
E since 8,9(E) = $8(E) = PH(E) (Section 7.2, Example 2). Since, more- 
over, the compact subsets of E coincide with the finite ones, a measure w 
on P(E) is Borel if and only if m(x) = u({x}) is finite for all points x € EF. 
Therefore every Borel measure u on E defines the real function m 2 0 on 
the discrete space E. If E is countable, u is uniquely determined by m. 
Then 


u(4) = ) me) (4€ 90. 


IEA 


In this case u — m is a bijection of the set of Borel measures on E onto 
the set of all nonnegative real functions on L. 


Now we obtain the important theorem, whose second part is called the 
F. Riesz Representation Theorem. 


7.5.4. Theorem. For every locally compact space Æ which is count- 
able at infinity, we have: 


1. For every Baire measure u, €*(E) C £!(u) and hence u — fu dy is 
a positive linear form on C°(E). 
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2. For every positive linear form J on @*(£), there exists exactly one 
Baire measure u on E such that 


I(u) = fudp, for all u € e«£). (7.5.4) 


Proof. 1. S,is compact for every u E €^, so that by Lemma 7.4.2 
there is an f E C$, with f(z) = 1 for all z € Su. But then L = {f 2 3} 
is a compact Baire set with S, C L, and thus with |u| € ||u|1z. Hence 
the u-integrability of u follows, since u is ?((6*)-measurable and 9[(9^) = 
P(E) by Theorem 7.4.5.1 


2. By the Daniell-Stone Theorem there exists a measure u on 9((6*) = 
$S8,(E) with C° C £1(u) and property (7.5.4). The observation preceding 
Definition 7.5.3 shows that u is a Baire measure. The uniqueness of y is 
obtained from Corollary 7.1.6, taking into account 7.4.2 and 7.4.6. J 


7.5.5. Corollary. Every Baire measure u on a locally compact space 
E countable at infinity is regular and o-finite. For every p C [1,4- œ|, 
G*(E) is dense in £?(u) relative to convergence in pth mean. 


Proof. By Corollary 7.1.6 and Lemma 7.4.3, u is o-finite. The outer 
regularity of u follows from Theorem 7.1.4 and Corollary 7.1.6. Thus for 
every A € Bo with u(A) < œ and every e > 0 there is an open set 
G E Bo such that u(@\ A) < e. This statement also holds for arbitrary 
A € Bo: By the o-finiteness of u, A is the union of a sequence (An) of 
Baire sets with u(An) < o. Therefore for every n = 1, 2, . . . there 
exists an open set Gn € Bo satisfying A, C Gr and u(G, N An) S 27>". 
But then G = UG, is an open Baire set with A C G and 


u(G@\ A) S w(O(Gn\ An)) S Zu(Gn N An) Se. 


lA 


Thus for (A there is an open set H € Bo with CA C Handu(H \ CA) 
< e. Then F = CH is a closed Baire set with F C A and u(AN F) < e. 
Hence 


u(A) = sup {u(F): F closed C A, F E€ Bo}. (7.9.5) 


The existence of an isotone sequence (L,) of compact Baire sets with 


E = UJ Ln was shown in Lemma 7.4.6. Then for every closed F € Bo, 
(F £A La) is an isotone sequence of compact sets in Bo with F as its union 


13 Since (e€*) C $3,(E) holds in general, the proof ‘shows the validity of 1 even . 
without the hypothesis of countability at infinity. 
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and thus 
B(F) = sup u(F CO Ln): (7.5.6) 


Therefore (7.5.5) and (7.5.6) imply the desired inner regularity of y. 

The rest of the assertion follows from Corollary 7.1.5: When u lies in 
C° |u|? also lies in €*; moreover, we have €* C £1(u). But hence it follows 
that C° C £(u), and thus we have the denseness of C° = C° N £?(u) in 
£?(u) relative to convergence in pth mean. J 


Thus we have verified regularity for another extensive class of measures. 


Examples 


4. Let E be a discrete uncountable space. On $8,(£) = $8(E) = P(E), 
we consider the Borel measure u with &(A) = 0 or = + depending 
on whether A is countable or uncountable. Then wu is not regular since 
u(K) = 0 holds for all compact (that is, finite, here) subsets K. The 
regularity of Baire measures proved in Corollary 7.5.5 is thus lost if we 
eliminate the countability at infinity. 


5. Once again let EF be a discrete, uncountable space and F’ its one 
point compactification. E is open in É' and hence Borel in F’. But E is 
not Baire in Æ’. Otherwise, by (7.2.4) we would have 


u(E) = inf (u(G): E C G, G open K,-set} 


for every Baire measure u on F’. Since E is uncountable, E cannot be a 
K,-set in EL’. Thus 
: u(E) = pL"). 


But this is false for the Baire measure defined by the unit mass at wo. 
Thus we have 8)(H’) = B(#’) for the compact space F”. 


Remark. If E is an arbitrary locally compact space and J is a positive 
linear form on €*(E), then by making essential use of (7.5.2) one can still 
prove the existence, though not the uniqueness, of Borel measures u on E 
which represent J in the sense of following two properties: C° C £!(u) 
and I(u) = f u du for all u € €*. However, among these measures exists 
only one inner regular Borel measure g and only one outer regular Borel 
measure g. One has ā € f, and i € u < f holds for every Borel measure 
u on E which represents J. The equality z(4) = f(A) holds for all Borel 
sets A in E which can be covered by a sequence of compact sets. This is 
the reason why some authors (for example, Halmos [3]) only call these 
sets Borel sets. The two different definitions of Borel sets only coincide 
when Z is countable at infinity. Furthermore, f(A) = (A) holds for a 
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Borel set A € S(£) whenever f(A) < +. As a consequence one 
obtains the following result: Assume that J is bounded, that is, there exists 
a real number a = 0 such that |/(u)| € oa||u| holds for all u € e*(£) 
(compare Section 7.5, Problem 1). Then there exists one and only one 
regular Borel measure u on E such that C° C £!(u) and I(u) = fu dy for 
all u € C°. This measure y is finite. The reader interested in details is 
referred to the pertinent literature; in particular to Bourbaki [1] and 
Courrége [2]. 


These results explain why Bourbaki calls every positive linear form on 
QG*(E) a Radon measure on E. 


PROBLEMS 


In what follows let E be a locally compact space countable at infinity. 


1. Fora Baire measure u on E the number ||u|| = u(E) is called the total 
mass of E (hence 0 < ||u|| € +œ). It is finite if and only if u is finite. 
[For the special case Æ = R?” see (3.4.1).] 

Consider a Baire measure u on E and prove the equivalence of the 

following conditions: 

(a) Lis finite. 

(b) There exists a real number a > 0 such that |fudu| < ||u|| for 
all u € e«(E). 

Prove furthermore: The smallest possible number « in (b) is ||u|| 

and one has ||.| = sup {fudu: 0S u € 1, u € eE)]. 

2. Prove: For a Baire measure u on E the following three conditions are 
equivalent: 

(a) wis finite. 
(b e*«E) C £(u). 
(c) CUIDE £!(u). 

3. Let J be a positive linear form on G*(E). Prove: There exists a unique 
finite Baire measure u on E such that I(f) = ff dy for all f E e?(£). 
(Hint: For each e > 0 and f E G5 (E), there exists a function» € G*(E) 
such that |f — v| € e Vf] 

4. Let E; and E: be locally compact spaces that are countable at infinity. 
A continuous mapping T: E; — E» is called proper if T-'(L) is com- 
pact in E, for all compact sets L in E». Prove: 

(a) Every continuous mapping T: Eı— E; is $8(Ej)-39(E;)- 
measurable. 

(b) For every proper mapping T: E; — E; and every Baire measure 
u on £i, the image measure T'(u) is a Baire measure on E». 
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7.6 THE SPECIAL CASE OF LOCALLY COMPACT SPACES 
WITH COUNTABLE BASE 


The theory developed in the preceding section is simplified in an essen- 
tial way when E is a locally compact space with a countable base. By 
Section 7.4, Example 2(c), every such space is countable at infinity. 
Examples of such spaces are: the Euclidean R?(p = 1, 2, . . .), every 
(abstract) Riemann surface (see Ahlfors-Sario [21], p. 144), the countable 
discrete spaces, the metrizable compact spaces. (By the theorem of 
Urysohn quoted in Section 7.3, Example 6, a compact space is metrizable 
if and only if it has a countable base.) 

The announced simplification is due above all to the fact that the Baire 
and Borel sets coincide in a locally compact space with a countable base. 
We show an even stronger result: 


7.6.1. Theorem. Every locally compact space Æ with a countable 
base is a Polish space. 


Proof. Let Q be a countable base for E and let (L,) be an isotone 
sequence of compact sets (which exists by Lemma 7.4.6) such that every 
compact subset of # is contained in an L,. Then obviously 


So eC CLE Eat qum, 2s nas) 


is a countable base for the one point compactification E’ of E. Since E” is 
compact and £ is open in EL’, the assertion now follows from Examples 6 
and 4 of Section 7.3. J 


7.6.2. Corollary. For every locally compact space E with a countable 
base, S84(E) = B(#). The system of compact subsets of E is a generator 
of B(#). 


Proof. The first part of the assertion follows from Corollary 7.2.4. 
By definition, 8(E) is generated by the open and thus by the closed sub- 
sets of E. Since, in particular, E is countable at infinity, every closed sub- 
set is a K,-set. Hence B(£) is generated by the compact sets. _ 


Thus, for locally compact spaces E with a countable base, the concepts 
of Baire and Borel measure also coincide. By Corollary 7.5.5, every Borel 
measure on F is regular. For finite Borel measures, the regularity also 
follows from Theorem 7.3.3. 

Finally, we show which property of €*(E) reflects the existence of a 
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countable base for E. For this we again use the metric of uniform con- 
vergence. 


7.6.3. Theorem. For every locally compact space E, the following 
two properties are equivalent: 


(a) E has a countable base. 


(b) With respect to uniform convergence there exists a countable, dense 
subset of G«(£E).'* 


Proof. (a) = (b): Let Q be a countable base for E. The arguments 
presented in Section 7.4, Example 2(c), show that without loss of general- 
ity we can assume all sets in Q to be relatively compact. Then the set J 
of all pairs (G,H) € Q X Q with the property G C H is countable. With 
every pair (G,H) € I we associate a function fe,m € C°, which exists by 
Lemma 7.4.2, with the following properties: 0 € fis, € 1, fe,m(2) = 1 
for all z € G, Sto C. H. Then for every point z € E or every pair 2, 
x of distinct points of E there exists a function f;, i € I, with f;(x) # 0 
or f;(zxi) ¥ fi(z»), respectively. Indeed there exists a pair (G,H) € I with 
x € G and hence with f(c,;(z) = 1, or, with z1 € G, and z: Z H, and 
thus with the property f.,m (v1) = 1, fie,m(x2) = 0. 

Now let 6 be the set of all real functions, defined in E, of the form 
P(fi,, . - . fi), where p € R[zy . . . ,2n] is a real polynomial in finitely 
many. variables em e ..« 2s such. that p(0,. «- +40) 0 abd 965. o s 
are elements of J. Then E obviously satisfies the hypotheses of the Stone- 
Weierstrass Theorem! and thus is dense in €?. By what was said preced- 
ing Lemma 7.4.2, we also have ® C C°. Thus 6 is dense in G^ with 
respect to uniform convergence. If in the definition of we now replace 
the real polynomials p by ones with only rational coefficients, we obtain a 


14 Since G*(E) is a metric space, this is equivalent to the existence of a countable 
base for e«(E). See Franz [32], p. 50. 

15 In the form needed by us this theorem says: Let E be a locally compact space and 
E a linear subspace of e?(E) which, whenever it contains two functions f, g € @ it 
also contains their product fg. Then if, for every pair of distinct points z, y € (? there 
exists a function f € E with f(x) ¥ f(y) and for every x € E there exists ag € $ 
with g(x) # 0, it follows that @ is dense in @9(E) relative to the metric of uniform 
convergence on E. This form of the theorem is obtained from the usual formulation 
for compact spaces (see, for example, Bourbaki [26]) by introducing the one point 
compactification E' of E and extending every function f € 0 to a continuous function 
f' on E’ (which is equal to zero at the point at infinity). Then 6; = {f’ +a: f € 6j 
a € R} is a linear subspace of @(E’) with the properties: f’, g! € 0| => f'g’ € 61; the 
constant real functionsliein @; ;for z » yin F’ there exists an f’ € CA with f’ (x) z& f'(y). 
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subset ©’ of ® which is obviously countable. 6" yields the desired result 
since 6" is dense in and thus also in C°. We see the former as follows: Let 
g =D(fi, - - - fi) be a function in 6. Like all functions from C°, the 
functions fi, . . . , fi, are bounded. If we approximate all the coefficients 
of p sufficiently closely by rational numbers, then we obtain an arbitrarily 
close uniform approximation of g by functions p'(fa, . . . ,fi,) from 6". 


(b) = (a): Let D be a dense subset in €^. Then the system Q of all sets 
{u > $j with u € D is a base for E. Indeed for every open set U and 
every zo € U there exists, by Lemma 7.4.2, an f € C° such that f(a) = 1 
and S, C U. For f there is a u € D with |f — u|| < 4. But then we have 


zy e pu CIE o0] G87, CU. 


This shows that Q is a base. But then, since D is countable Q is also 
countable. J 


PROBLEMS 


1. Let „u be a Borel measure on a locally compact space with a countable 
base. Prove: 

(a) There exists a greatest open set G of measure zero. S, = GCG is 
called the support of u. 

(b) A point z € E is in S, if and only if (U) > O0 for all open 
neighborhoods U of x. 

(c) For a function f € G,(E) one has [f du — 0 if and only if f 
vanishes on S,. Determine Sy for the L-B-measure ^? on R? and 
Se, for a point a in a locally compact space with a countable base. 

2. Let E be locally compact and D C G'(E). D is called rich if for every 
compact set K there exists an open relatively compact neighborhood 

U of K such that every f € €*(E) with support S; C K can be uni- 

formly approximated on E by functions d € Ð having their support 

in U. 

(a) Assume that E is countable at infinity and that D C C*(£) is 
rich. Prove: Two Baire measures y and v coincide if and only if 
[udu = [fu dv for all u € D. 

(b) Prove: For every locally compact space E with a countable base 
there exists a countable set D C G*(E) which is rich. [H7nt: 
Modify slightly the proof of Theorem 7.6.3.] 

(c) Prove: In the situation if (b) one can even assume that for every 
compact set K C E there exists a function d E D such that 
0sdslandd(x)- T foral EK. 


226 CONTINUATION OF MEASURE AND INTEGRATION THEORY 


7.7 CONVERGENCE OF BAIRE MEASURES 


Let E be a locally compact space countable at infinity. Henceforth, let 
m = M(E) denote the set of all Baire measures on E. By the Riesz 
Representation Theorem, M(E) can be canonically mapped bijectively 
onto the set of all positive linear forms on €* = €*(E). For u, v € M(E) 
and any two real numbers a = 0, 8 = 0, ou + Bv also lies in M(E). 
Thus, M(E) is a so-called convex cone. Besides M(E), we often consider 
the following two subsets: 


oe = 9m(E) = (u € MEF): u(E) < oj, 
m = M(E) = (iu € ME): uE) = 1}. 


These are correspondingly the sets of all finite Baire measures and of all 
Baire probability measures on E. Obviously, MI(E) C M(E) C 9n(E).!6 
In particular, all measures e, defined by the unit mass at a € E lie in 
M(E). M(E) is a convex sub-cone of M(E). 

Depending on whether we consider the elements of M(E) as measures 
on GE) or as positive linear forms on €*(E), the following two concepts 
of convergence in M(E) suggest themselves: We can define the con- 
vergence of a sequence (un) in M(E) to a measure u € M(E) either by 
the requirement lim un(A) = u(A) forall A € Bo(A) or by lim ff du, = 


Sf dy for all f € e (E). It will be seen immediately that the first concept 
of convergence is of no further interest, while the second is of high 
importance. 


7.7.1. Definition. A sequence (4n)nen in M(E) is said to be vaguely 
convergent to a measure u E M(E) if 


lim ffdu, = [fdu forall f € e«E). QD) 


A sequence (un) in M(E) is vaguely convergent if and only if the sequence 
(ff dun) of real numbers converges for every f € €*(E). In fact, then 
f lm f f dun is a positive linear form on €*. By the Riesz Representa- 


n— © 


tion Theorem there is thus exactly one u € M to which (un) converges 
vaguely. At the same time it also follows that the vague limit of a sequence 
in M(E) is uniquely determined. 


Examples 


1. Let (xn) be a convergent sequence in E with lim xz, = x. Then (e) 
converges vaguely to ez, since we have TE de, — = F(a) for alla E E, and 
lim f(z.) = f(x) for arbitrary f € €. 


16 M(E) was already introduced in Section 3.4 for the case E = R?. 
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On the other hand, lim &,(4) = €(A) does not hold in general for all 
A € %,(£). If E has a countable base and if z, Æ xforalln = 1,2,..., 
then it suffices to choose A = {zx}. (A is Borel and thus, in this case, 
Baire.) This example shows why we must attribute little significance to 
the first of the concepts of convergence above. 

2. Let (œn) be an arbitrary sequence of real numbers Z0 and (zn) a 
sequence in E such that for every compact set K C E there exist at most 


finitely many n = 1, 2, . . . with zn € K. (In other words, E is not 
compact, and lim x, = wo in E’.) Then the sequence of measures un = 
Qnéz,, Nn = 1, 2,... , is vaguely convergent to the zero measure 0. For 


arbitrary f C C°, we have Sf dun = Onf(tn) = 0 for all n with zn C Sy. 
Since S, is compact, we thus have by our hypothesis ff du, = 0 for all n 
sufficiently large. 

This example shows (for a, = 1) that vaguely convergent sequences of 
measures in 9I (E) need not converge to a measure in M(E). 


An important example of vague convergence is provided by the 
stochastic convergence, introduced in Definition 2.11.2, of a sequence (X,) 
of real random variables on a probability space (Q,9(, P). Here the sequence 
(Px.) of associated distributions is a sequence in 9I (R). 


7.7.2. Theorem. If a sequence (X,),ew of real random variables on 
a probability space (Q,9(,P) converges stochastically to the real random 
variable X, then the sequence (Px,);cw of distributions converges vaguely 
to the distribution Px of X. If X is P-almost surely constant, then the 
converse also holds. 


Proof. Every function f € €*(R) is continuous and vanishes outside 
its compact support. Therefore f is uniformly continuous on R, so that 
for every e > 0 there exists a 6 > 0 such that |x’ — z"| « 6 implies 


WEN =E O CERO: 
If we now set A, = {|X, — X| = ô}, then A, € A and 


| f faPx, = f faPx| =| f f x.aP — f foxaP 
su |f» X, —f«X|aP + [UL |f» X, —foX|dP 
< 2| f||P(A;) + P. CA.) s 2| f||PCAs) +. 


Here we used the trivial inequality |f o X, — fe X| € |fe Xa| + |fe X| S 
2||f||. By the definition of stochastic convergence, lim P(An) = 0. Thus 


no 


lim ff dPx, = ff dPx and the asserted vague convergence follows. 


n o 


For the proof of the converse, let X = o P-almost surely for some a € R, 
that is, Px = e,. Suppose further that (Px,) converges vaguely to ea. For 
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the interval I = Ja — e, a + e[ there are obviously functions f, g € €*(R) 
satisfying f € lr Sg and f(a) = g(a) = 1. Then fo X, S Iro X. S 
g ° Xn, and consequently 


[f dPx, x fir» X, dP S fg dPx,. 
Hence 
lim PÍX,€ I} = 
because of the convergence of the outer terms to f(a) = g(a) = 1. If we 
note that (X, € I} = {|X, — a| < e} and thus 


P{|X, — X| 2} = P{|X, —al ze] = 1 — P{X, C Ij, 
then we obtain lim P(|X, — X| 2e«} =Oforalle > O0. J 


"— © 


If X is not almost surely constant, then we cannot in general derive the 
stochastic convergence of (X,) to X from the vague convergence of (Px,) 
to Px. This is shown by the following: 


Example 


3. Let (Q,9(,P) be the probability space defined in Section 5.1, Example 
3, (An) the independent sequence of events given there, and X, = 14,. By 
the example in Section 6.4, every X, has distribution 6? with p = 4, and 
thus the sequence (Px,) is constant equal to 8? and hence vaguely con- 
vergent to Px, On the other hand, (X,) is not stochastically convergent, 
since for any two natural numbers m Æ n and every 6 with 0 < ô < 1, we 
obviously have P([X, — Xn 2 6} = #:4+24-24 =H. 

The vague convergence of sequences in M(E) is derived from a topology 
on M(E), the so-called vague topology. This is defined as the coarsest 
topology on M(E) relative to which all of the mappings 


p—ffdu (f E e(E£)) (7.7.2) 


are continuous. Thus for every measure uo € M(E) we obtain a funda- 
mental system of neighborhoods of uo in the vague topology in the form 
of the system of all sets 


Vy, poas selho) = {u G M(E): MZ du Sý Jf: duol p^ €; i= I CERTE) n], 
MES) 
where fı, . . . , Jn are finitely many functions from €*(E) and e > Ois a 


real number. The vague topology is Hausdorff since, by the Riesz Repre- 
sentation Theorem, for distinct measures u, v € M(E), there exists an 
f € C(E) such that ff du » ff dv. 

Hence it is also clear what is meant by the vague convergence of a 
mapping t — m of a subset A of a topological space T in M(E) as t con- 
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verges to a point to in the closure of A. With respect to the vague topol- 
ogy, the convergence 


lim u = u € M(E) 
i 


means just 
lim ffdu, = ffdu, for all f € e«E). (7.7.4) 


tto 


tCA 
Example 


4. Let K 2 0 be a )?-integrable real function, defined on E = R?, with 
[K dA? = 1 (for example, the indicator function of the unit cube [0,1]). 
For every real r > 0, we set 


K,(x) = r?K(rz) (x E R»). 


Then K, = 0 and K, is Xintegrable, and fK,d)»» = 1. We need only 
take note of (1.7.9), where for the homothetic mapping x — H,(x) = ra 
of R? onto itself we have H,(A?) = r-?X». But then 


[K.da = r? |K o H, d» = rr | K dH,Qv) = 1. 
In the sense of the vague topology we have 


lim K,3? = eo." 


TI +0 


Here r — K,\” is a mapping of ]J0,+ [ into 9i! (R7). For the proof we 
note that for every f € @°(R”), we have the equality 

/ JK, dx = m Í f: (KoH) de = m Í (fo Hz?)K dH,Q») 
J (fo H;)Kdw = hE K(x)? (dz). 


This implies the assertion, using the Lebesgue Convergence Theorem, 

since, on the one hand, lim f(2/r)K(x) = f(0) K(x) for all v € R? and, 
r> +o 

on the other hand, for all r > 0, 


(GEARI KI ESEE: 


In particular, all discrete Baire measures on E belong to M(E). These 
are the measures ô which can be represented in the form 


k 
6 = D QiEr; 
i=1 


17 We shall make essential use of this approximation of the unit eo 


, 


in Section 8.2. 
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by means of finitely many points zı, . . . , z € E and real numbers 
012 0,... , o, 2 0. Using the vague topology, we can now show that 


every Baire measure on É can be approximated by discrete measures. 


7.7.3. Theorem. For every locally compact space Æ countable at 
infinity, the set of discrete Baire measures on F is dense in M(E) relative 
to the vague topology. 


Proof. Suppose we are given a measure wo C M and a vague 
neighborhood 


Vr, 1T f, (0) 
of uo in the form described in (7.7.3) (fi, . . . Jn € C°; e > 0). We want 
to show the existence of a discrete measure 6 lying in V;,...,;,.«(uo). 


For this we consider a compact Baire set K such that 


\U 8; C K 


i-1 
and an 7 > 0 with nuo(K) € e. For every y € K there exists a compact 
neighborhood U, € So(F) of y in E such that |f;(y’) — fi(y")| € n for 
any two points y, y" € U, and arbitrary 2 — 1, .. . , n.!? Finitely 
many of these U,, say Uy, . . . , Un cover K. If we set 
A,;2 Kf U As; 2 Kf Uy. \ Ay . 


"2 


Ax = KIA GEN Aga do o VJ An 
then Ai, . . . , A, are relatively compact, pairWise disjoint Baire sets 
such that 

KSA ses NA 
and rn) — 9s for all Beas PS, Se and 
arbitrary y', y" € A;. Since only these properties will be used below, we 
can assume that no Ay, . . . , A, is empty. If we now arbitrarily choose 
231€ Ay .. . , 2; € Ax, then the discrete measure 


k 
b=) u(Ades, 


j=l 


yields the desired result. [Note that the measure uo(A;) of the relatively 
compact sets A; is finite.] The proof is derived from the following inequali- 


18 By Lemma 7.4.2 we can choose U,, for example, of the form {h = 6} with suitable 
h € @ and 8 > 0. 
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ties, valid for 2: 1, i. ym 
k 


k 
|| dm - f eda] | Y f, fitm — Y aen 


E j=l 
| Di i, (fi — fix) duo 
k 


n Y m(A) = mK) < e 


J 


I 


DARO 


IIA 


We have only to recall that |f(z) — fi(x;)| € n for all z € A;. J 


7.7.4. Corollary. The discrete probability measures on E are dense 
in 9I (E) relative to the vague topology. 


Proof. Now suppose in particular that uo is a measure in M(E). 
We consider the measure 6 = Zuo(Aj)e, in V,,...,5,.(uo) constructed 


above, and set o; = u((Aj) for —1,..., k. When K = E, arp =: 
+ a, = land there is nothing more to prove. When K + E, by construc- 
tion we havea; + ` -© + a, = ue (K) € uo(E) = 1. Thusitsuffices to set 
arı = 1 — (ar ^: -: +a) 
and to choose an 241 € EN K. Then 
k+1 
6’ = Oz; 

j=) 
is a discrete probability measure satisfying IE? dó — Sf dó for all? = 1, 
. ; n, Since x41 does not lie in S U - * - US; Consequently, when 


6 lies in LN Lu a 1n); so does 6’. Ej 


Now we investigate the question of whether equality (7.7.1) or (7.7.4) 
also holds for more general continuous functions. Here we can immediately 
observe that for a measure u E M(E) every function f E G^(E) is 
u-integrable since it is 8o(#)-measurable and its absolute value is bounded 
by a constant and therefore by a u-integrable function. We formulate the 
relevant results only for sequences; their generalization to mappings 
t — y, is obvious. 


7.7.5. Theorem. If a sequence (u;).gw in M(E) is vaguely conver- 
gent to u € M(E) and the sequence (un(#)) nen is bounded,” then y is a 
finite measure, and for every function f € ©°(£), 


19 In particular, every measure un is then finite. 
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lim [f du. = ff du. 
n o 


Proof. For every u € M(E), 
u(E) = sup [udu (7.7.5) 


We need note only that for every compact set K C E, by Lemma 7.4.2 
there exists a u € C° with 0 € u € 1 and Ig € u and that u € 1s, for 
every u € Œ with 0 € u € 1. Hence 

sup fudu = sup w(K) = &(E), 


uc ec K compact 
OSuS1 KEBo 


where the second equality follows from the regularity of u. 
Now let u be the vague limit of (un) and a = sup ur(E). Then by (7.7.5), 
nEN 


fu dun € un(E) < a and hence fu du € aforallu € €* with 0 Sul. 
Another application of (7.7.5) then yields u(E) < a, that is, the finiteness 
of y. Now let f E e*"(E). By Definition 7.4.7, for every e > 0 there exists 
ag € @ with ||f — gl| € e. Therefore, 


Sf dun — fg dunl S lf—9lu (E) Soe (n=1,2,...) 
and |ff du — fg du| € ae. Using the triangle inequality, we obtain 
[Jf du, — ff dul € 2ae + |fg du, — fg dul 
for all n, and hence the assertion since Jim fg du, = fgdu. J 


Remark 1. Even when all measures un (n EN) and u are finite, we 
cannot do without the requirement sup u,(É) < œ in general. This is 
shown by Example 2 in the special case E = R, x, = n and o, = n for 
alln = 1, 2,... . Indeed, the function f(x) = inf (1,1/|z|) for z # 0 
and f(z) = lforz = Oliesin @°(R). But ff du, = 1foralln and ff dy = 0. 


We emphasize the transition from QG*(E) to @(£) by the following: 


7.7.6. Definition. Let u, ui, us, . . . be measures from 9m*(E). The 
sequence (un)nen 1s said to be weakly or Bernoulli convergent to y if 
lim ffdu, = [fdu for all f E @(E). (7.7.6) 


7.7.7. Theorem. A sequence (un)nen in M(E) is weakly convergent 
to a measure u € M(E) if and only if (un)nenw converges vaguely to u and 


lim m(E) = u(E). » (7.7.7) 


n— 


Proof. The given conditions are necessary for weak convergence since 
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C° C C^and 1 E G^. Now we show that these conditions are also sufficient: 
By (7.7.5), for every e > Othereisa u € C° such that 0 < u € land 


u(E) — [vu du = fa — Ww) du < €. 
By hypothesis, 
lim fu dun = fu du and "im [1 dun = fl dp. 


Thus for n sufficiently large, 
fa — u) dus < e. 
For these n and for all f € C, 


[JF — u) dual s WIS — u) dun < fle. 


Analogously, |{f(1 — u) du| € || fle. As in the preceding proof, we use the 
triangle inequality to obtain 


[Jf dun — Jf du| s 2|fle + |ffu dun — ffu dul 


for n sufficiently large. Since fu lies in €* and thus ( T fu dun) converges to 
f fu du, the assertion now follows.  .| 


7.7.8. Corollary. Every sequence (un)nen in 3I (£) which converges 
vaguely to a measure u € M(E) is weakly convergent (and conversely). 


Remark 2. The concept of weak convergence obviously can also be intro- 
duced in the following more general situation: Let E be a topological 
space, $B4(/Z) the c-algebra of its Baire sets, and u, ui, u2, . . . a sequence 
of finite Baire measures on Æ. Since the functions in G*(E) are Bo(F)- 
measurable, we can define as above: The sequence (un)nen is said to be 
weakly convergent to u if (7.7.6) holds. We shall shortly make use of this 
possibility for generalization. Moreover, weak convergence can be derived 
from a topology in the same way as vague convergence. In (7.7.3), e*(E) 
takes over the role of G*(E). We speak of the weak topology on the set of 
finite Baire measures on £. 


Example 1 of this section shows that weak convergence of a sequence 
(un) in M(E) to u € M(E) does not imply the convergence lim Sf dun = 


ss du for not necessarily continuous, bounded, 8o(#)-measurable func- 
tions f. Nonetheless, we can weaken the continuity of the functions f for 
weak convergence. We restrict discussion to the case in which E has a 
countable base. Referring to Theorem 7.6.6, we therefore carry out our 
reasoning immediately for Polish spaces. 


Now let E be a Polish space and M(E) the set of its finite Borel measures. 
We consider Borel-measurable bounded real functions f on E which are 
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u-almost everywhere continuous relative to a measure u € M(E). These 
are functions which are continuous in all points z € CN where N € B(E£) 
is a w-null set. Important examples of such functions are the indicator 
functions 1g of null-boundary Borel sets Q, which are defined as follows: 


7.7.9. Definition. A Borel subset Q of a Polish space E is said to be a 
null-boundary set with respect to a measure u C M(E), or à. u-null- 
boundary set, if u(Q*) = 0 for the topological boundary Q* = QWQ of Q.?? 


Examples 


5. Every interval J of the number line R is a A-null-boundary set. 


6. A set Q € B(£) is a null-boundary set with respect to a unit mass 
e, (a € E) if and only if a Z Q*. 


7.7.10. Theorem. Let E be a Polish space. Suppose a sequence 
(un)nen in Me(E) converges weakly to a measure y € M(E). Then 


lim ff du, = ff du 


n> © 


for every Borel-measurable, bounded, u-almost everywhere continuous 
function f. In particular, 


lim u,(Q) = u(Q) 


n> © 


for every u-null-boundary set Q E€ $3(£). 

Proof. Let d denote a metric defining the topology of E and K,(x) = 
(y € E: d(x,y) < r} be the open ball of center x and of radius r. By 
hypothesis there is a Borel set Eo in E such that u(Eo) = u(E) and such 
that the bounded Borel-measurable function f is continuous at all points 
x € E, By Theorem 7.3.3, u is regular. Therefore for every e > 0 there 
is a compact set K C Eo such that u(EoN K) € e. Every point x € K is 
the midpoint of a sphere U; = K, (x) such that the variation of f in U'(z) 
is at most e, that is, |f(y’) — f(y’’)| € e for all y', y" € U}. Because of 


the compactness of the set K, we have K C U, U - -- U Uz, for a 
suitable finite subset [zi . . . ,z4] of K if we set U, = K, (x). If we 
define 

o = inf f(E), 8 = sup f(E), o, = inf f(U;,), 8, = sup f(U;,) 


for ally = 1, ... , n, then, due to the normality of E, for every v = 1, 
. , n there are functions g, h, C G*(E) satisfying g,(x) = o, and 


20 ,-null-boundary sets are also called u-squareable. 
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h,(x) = 8, for all z € OR, g(x) = a, and h,(x) = B for all z C EN U}, 


and a Sg, Sa, S 8, Sh, € 8. Then obviously, in particular, g, € 
f s h,. Let 


g = sup (gı, 2 Um In), h = inf (hı, ga ¢ shn); 
then g and h are functions in G((E) witha € g € f € h € f. For all 


points r € K, h(x) — g(x) S e. Indeed, every point que IG hes an ase. 
consequently, h(x) — g(x) € h(x) — g,(x) = 8, — a, € e. Now we can 
complete the proof as follows: We have 

f a-o) du = feh- o) du + fegh — 9) du 
UO) F O = eN LO S (ut) 4-8 — a) 
and since g € f € h, g, h E G*(E), we also have 


fgdu = lim fg dun € lim inf ff du, € lim sup ff dun € lim fA dun 


n— © n— wv n 


= fh du 


IIA 


and 
fgdu € ffdu € fh da. 


Therefore, ff du, lim inf [f dus, and lim sup Sf du, differ at most by 
e(u(.E) + B — o). Since e > 0 was chosen arbitrarily, the assertion now 
follows. J 


7.7.11. Theorem. Let u, uj, us . . . be probability measures on 
Hı = B(R) and F, Fi, Fs, . . . be the associated distribution functions. 
Then if the sequence (u;)&ew is vaguely convergent to y, 

, lim £,(x) = F(a) (7.7.8) 


for all points z € R at which F is continuous. If F is continuous on all of 
R, then the sequence (F;),cy converges uniformly to F. 


Proof. By Theorem 7.7.7 we have weak convergence to u. Thus 
lim u,(Q) = u(Q) for every u-null-boundary set Q € $8! and hence by the 
definition of distribution functions, lim F(z) = F(a) for all x € R for 


nao 


which the interval Q, = ]— ©,z[ in a »-null-boundary set. Now ]— ©,z] = 
Q. = a Q.+1/% and therefore 

&(Q) = lim u (Qi) = lim F (: y 3! 
Consequently, Q, is a u-null-boundary set if and only if F is right-con- 


tinuous in x and therefore (due to the left-continuity of distribution 
functions) is continuous in z. This proves the first part of the assertion. 
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For the second part, we assume that F is continuous. For every e > 0 
there are numbers a « b with F(a) < e and 1 — F(b) « e. Further, we 
can choose finitely many zo . . . ,% € R such thata = xo < xi & °°: 
« zx, = b and F(z) — Elz) < efor—1,..., k. By the first part 
of the assertion we can determine n so that |F,(z;) — F(xj)| < efori = 0, 
1, .. . , k. But then we have |F,(x) — F(x)| < 2e for all z € R, that is, 
the convergence to F is uniform. In fact, forz < zo 0 € F(x) € F(a) < € 
and0 x F,(x) € F,(xo) < F(xo) + e < 2e and thus |F,(x) — F(z)| < 2e. 
We conclude analogously for « 2 a. When 2,1 € x < x; for some 
1= 1 atm XE then F (zi) =< F(a) S F(x;) < F(2;_1) + e and F (xii) = 
e < F,(zi3) S Fa(z) S F(a) < F(a) +e < Fin) + 2e, hence again, 
|F.(x) — F(x) < 2e° I 


Remark 3. The second condition in Theorem 7.7.10 and the first in 
Theorem 7.7.11 are also sufficient for weak convergence of the sequence 
(un) to u. See the following Problems 4 and 5. 


PROBLEMS 


1. Let E be a locally compact space countable at infinity, and let D be 
a rich subset of G*(E) (see Section 7.6, Problem 2). Assume that for 
every compact set K C E there exists a function d in D such that 
Oxdzlandd(x) = 1 for all z € K. Prove: 

(a) A sequence (un) in M(E) is vaguely converging to u € M(E) if 
and only if lim u,(d) = u(d) for all d € D. 


(b For E = R the set D of all continuously, differentiable functions 
d € G*(R) has all the properties mentioned above. 
2. Use Problem 1 in order to prove that the sequence un = f4A!, n EN, 
where f, is the function x — 1 — sin nx on R, converges vaguely to Xt. 
3. Let „u be a finite Borel measure on a Polish space E. Prove: 
(a) The system Q, of all u-null-boundary sets is an algebra in Q. 
(b) For every f € C?(E) there exists a countable set D; C R such 
that (f > a} € Q, for alla € RN Dy. (Hint: For every finite 


subset [oj . .. ,an} of R one has Zz?,u((f = a}) € (E) 
< 4 o.] 
4. Let u, ui, us, . . . be finite Borel measures on a Polish space. Prove: 


The condition of Theorem 7.7.10 is not only necessary but also 
sufficient for weak convergence, that is, (un) is weakly convergent to 
u if lim u4(Q) = &(Q) for all u-null-boundary sets Q. [Hint: Imitate 
the proof of Theorem 2.3.6 and use Problem 3 in order to prove that 
every f € e (E) is the uniform limit of an increasing sequence (un) 
in $($2,) (where ¥(Q,) is defined as in Section 7.1, Example 1).] 
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9. Prove that the condition (7.7.8) is also sufficient for weak convergence 
of (un) to pu. 

6. Let (an)nen be a sequence of numbers in the interval ]0,1[. Remove 
from [0,1] an open interval Ji; of length o1 containing the center + of 
[0,1]. This leaves two disjoint closed intervals Ji; and Jis. Remove 
from J; an open interval Iz; of length &;A!(J 3) containing the center 
of Ju (i = 1, 2). This leaves four pairwise disjoint closed intervals 


Jj ..., Jos. Remove from Js an open interval Iş of length 
œs! (J'»;) containing the center of Jz (i = 1, 2, 3, 4). This leaves 8 = 23, 
pairwise disjoint closed intervals Jz, 2 = 1, . . . , 8. Continuing 


this way one obtains for each n € N pairwise disjoint closed intervals 
4 Prop T deo: M n 


dI WEN UN POR a Nive) 


n=1 


is known as a Cantor-like set, and for a, = (1/n)(n € N) as Cantor 
(ternary) set. Prove: 
(a) C is compact and nonempty, and has empty interior. 


(b) d(C) = lim [[ à — a). 
n iml 
(c) AC) = Oif and only if Z7, a, = +. (Hint: Use (6.2.4) and 
the theory of absolutely convergent infinite products.] 
(d If Zz,o, < +œ, U z]0,J(X C is an open set in R whose 
boundary U* = Ü \ U is not a \-null set. 
7. Construct an open subset of ]0,1[X]0,1[ whose boundary in R? has 
positive A?-measure. 


7.8 VAGUELY COMPACT SETS OF MEASURES 


We again consider a locally compact space E which is countable at infinity, 
and the space M = M(E), of all Baire measures on E, equipped with the 
vague topology. We are interested in the compact (relatively compact) 
subsets of M in this topology, which we designate as vaguely compact 
(vaguely relatively compact). 

We can immediately give a necessary condition for vague relative 
compactness of a set H C M. By the definition of vague topology, the 
real function u — Sf du is continuous on M for every function f € €*(E). 
Consequently, the image of H under this mapping is a relatively compact 
subset of R, and hence is bounded. This observation leads to the following 
definition: 
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7.8.1. Definition. A set H C M(E) is said to be bounded if 
sup|ffdu| < +, forall f € e£). (7:831) 
„EH 


Thus, boundedness is necessary for vague relative compactness of a set 
H C m. We shall show that this condition is also sufficient. 


7.8.2. Theorem. Aset H C M(E) is vaguely relatively compact if 
and only if it is bounded. 


Proof. We need only show that boundedness of H implies vague rela- 
tive compactness. Thus let o;, for all f € C°, denote the supremum 
occurring in (7.8.1), I; the compact interval [—a,,a,] in R, and H the 
closed hull of H in M. Then Sf du € I, for all f € © and all u € Ë. In 
fact, for every f € C° and every e > 0 


Vj, = Viu) = (»€:|ffd» — ff dul < e 


is a vague neighborhood of y, that is, H O Vj, z£ Ø. But ff dy lies in 
I, for every v € H A Vje and thus 


[Jf dul € ay + [Sf dv — ffdu| < as + e 


for every e > 0. This implies Sf du € I;. 


Now we consider the Hausdorff product space R* = [| R,, which we 
Ee 
obtain by associating a copy R; = R of the real line with each f. The 
product space J = [| J, is a subspace of R* which, according to the 
SEC 
familiar theorem of Tychonov, is compact as the product of compact 
spaces. We can map W into R” by associating with each measure yu the 
real function f > ff du on C° which lies in R. Thus we have a mapping 
$: M — R* which is injective by Theorem 7.5.4. According to what was 
shown in the introduction, ®(H) C J. Hence the assertion is proved if we 
can show: (a) ® maps M homeomorphieally onto #(M). (b) #(M) is closed 
in R”. Then (Ë), as a closed subset of (2), is also closed in R”, and 
(F) C J implies the compactness of (A) and thus of H. 

To prove (a): Continuity of 9 is equivalent to continuity of every 
“component” of $, that is, every mapping u — jf du with f € C°. But 
this continuity follows from the definition of the vague topology. Con- 
tinuity of the inverse mapping W of © is equivalent to the continuity of 
(u) > ff dV(®(u)) = [f du on (M) for each f E C°. But this mapping 
is the restriction to (IN) of the projection of R* on the fth coordinate 
axis. To prove (b): Let J € R” be a point in the closure of &(9) in R”. * 
Then J is a positive linear form on C°. Indeed, let f, g be functions from C° 
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Te Re 


Psd)! cs Ces Eeg) eese 
If) — IG) «€ e Ir'(o — 1( < $9 


is a neighborhood of J in R“ and thus contains a point I’ = (u) € &(M). 
Hence I’ is the positive linear form q > fa du on €*; hence, we obtain 


Ice E ee fg) Ct 9)] 
TG) OE) e (0) =< Se 


Since e > 0 was chosen arbitrarily, 7(f + g) = I(f) + I(g). Analogously 
we see that I(af) = al(f) (f € €, a € R) and I(f) = 0 forf € €^. Thus 
by Theorem 7.5.4 there is exactly one v € M with $(v) = I; that is, J 
lies in #(M). Hence #(M) is closed. 

This proves the theorem. J 


7.8.3. Corollary. For every real number a 2 0, the set 


Sa = {u € 9I (E): (E) S aj 


is vaguely compact. 


Proof. For each f € € and all u € &, |ff dul < fIf| du € allfll. 
Therefore 8, is bounded and thus vaguely relatively compact. Now we 
still have to show that &, is closed in M. By (7.7.5), & is the set of all 
u € M such that fu du € a for all u € G* with values between 0 and 1. 


Since u — fu du is continuous, the set A, = fu € M: fu du € a} is 
closed for each u € C°. Since 


&a is also vaguely closed. J 


Remark 1. The set of all measures u € M(E) such that 4(£) = aisin 
general not closed and therefore not compact. This can be seen from 
Example 2 of Section 7.7 if we choose the numbers æ» there, all equal to a. 
More precisely, the example shows that (for a = 1) for noncompact £, 
the set M(E) is not closed. MI(E) is closed for compact E, since the 
constant function 1 lies in @°(#) = e(£). 


In order to be able to use sequences in investigating the vague topology 
on M(E), we still need to know when M(E) is metrizable. Therefore we 
show: 
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7.8.4. Theorem. Let E be a locally compact space countable at 
infinity. The vague topology on M(E) is metrizable if and only if E has a 
countable base. 


Proof. The mapping x — e(x) = e, of E into M(E) is injective and a 
homeomorphism of E onto e(E). The latter can be seen as follows: For 
every point x € E, the sets 


My, en insi (x) = ty = E: |fi (ac) — fily)| < 0, 0 = D ENC n] 


(fu ...,f.€ Cn > O0 n = 1,2, . . .) form a fundamental system of 
neighborhoods of x. (Indeed, if U is a neighborhood of x € E, then by 
Lemma 7.4.2 there is an f € @ satisfying 0 < f < 1, f(x) = 1, and 
S, C U. But then M ,,js(z) C U.) Using the notation of (7.7.3), we have 


e(M s, ee 262) = e(E) ^ V; Janse fan (ex) 


for arbitrary fi, ..., f. € C° and n > 0, which proves that o is a 
homeomorphism of EF onto (E). Therefore, the metrizability of M(E) 
and hence of (E) implies that of E. But for a locally compact space 
countable at infinity, metrizability is equivalent to the existence of a 
countable base.?! 

Now assume the existence of a countable base for E. Then by Theorem 
7.6.3 there is a countable set Do C C° which is dense in C°. 

Moreover, we can choose a sequence (K,) of compact sets such that 
Kn C Kny for all n = 1,2, . . . and VJ Kn = E, and a sequence (en) in 
O* such that 0 S e, S land e,(z) — Lforallz GK; (n' 1,2, 
Then 


D = DU (de.:d € Don € NJ Uie::n c NJ 


is again a countable subset of C° which is dense in €. 

As in the proof of Theorem 7.8.2, we consider the mapping ®: m > R? 
which, with every measure u € M, associates the real function d > f d du 
defined on D. Then ® is injective for the following reason: Let u, v € M 
with fa du = fa dv for all d € $5, and suppose we are given f € G*. The 
support S; is contained in a Kn, For every e > 0 there is a d € Do with 
|f —d| € e But then f = fe, and thus |f — de,| S ee, that is, 
Iff du — f de, du| S efe, du and |ff d» — f de, dv| € efe, dv. Since 
den, E D, this implies |ff du — ffdv| € 2efe,, du for each e > 0. Thus 
fij du = Ii dv for every f € @’, that is, u = v. We show the continuity of 
9 as in the proof of Theorem 7.8.2. Again 9 is a homeomorphism of M 
onto (M). For this we need show only (see the proof of Theorem 7.8.2) 


?! See N. Bourbaki [26] as well as Section 7.3, Example 6 and the proof of Theorem ~“ 
7.6.1. 
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that for every f E C°, the mapping ®(u) > bi du is continuous on (9t). 
If f lies in D, then this mapping is the restriction to #(M) of the projection 
of R? onto the fth coordinate axis, whence follows the desired continuity. 
For an arbitrary f € C°, as above, for e > 0 we determine an e,, and a 
d € Do with |f — de,,| € cen, Then 


f/f du — Í de, du| € eje du, 


where, according to what has just been shown, ®(u) — f den, du and 
(u) > fen, du are continuous on #(M). Then, in particular, &(u) — fen, du 
is locally bounded on &(931/), that is, every point ®(uo) € #(M) has an open 
neighborhood U,, in #(M) on which $(u) > fen, du is bounded. The above 
inequality then implies that (u) — Sf dy can be approximated uniformly 
on U,, by continuous functions of the form $(u) — f den, du with d € D. 
Therefore $(u) — f. f du is continuous on every U,,, hence continuous on 
(M). 

Now we need only recall the familiar theorem?? according to which 
every countable product of metrizable spaces, in particular R?, is metriza- 
ble. Then the subspace #(M) and the space M homeomorphie to it are 
also metrizable. J 


Remark 2. Similarly, but more simply, it can be shown that for every 
Polish space E, the weak topology on the set of finite Baire measures on E 
is metrizable. The details are left to the reader since we do not make 
explicit use of this result. 


PROBLEMS 


1. Let v be a Baire measure on a locally compact space E countable at 
infinity. Prove: The set H of all u € M(E) satisfying 0 € fu du € 
fu dv for all u € @% (E) is vaguely compact. 

2. Let E be a locally compact space with a countable base and let (dn) nen 
be a sequence in €*(Z) such that the set D of all d, has the properties 
mentioned in Section 7.6, Problem 2 (b) and (c). Define 


p(u,v): = » 2-* min (| f dn du — f an dv 
n=1 


for Borel measures y, v on E. Prove: pis a metric on M(E). The corre- 
sponding topology is the vague topology on M(E). 


?? See N. Bourbaki [26]. 


o 


FOURIER ANALYSIS 


Below we present the main features of the theory of Fourier transforms. 
This theory is one of the most powerful tools of analysis available in 
probability theory. It leads to elegant solutions of many of the problems 
involving convolution of measures. The importance of the convolution 
produet for probability theory was introduced to the reader mainly in 
the discussions in Section 5.3. We restrict consideration to the case of 
measures on Euclidean space R?, although the theory can be fully devel- 
oped on locally compact Abelian groups. (See Rudin [42] and Hewitt- 
Ross [36].) 


^. 


8.1 FOURIER TRANSFORMS OF MEASURES AND FUNCTIONS 


(a) Integration of complex-valued functions 


The integration theory developed for rea] and numerical functions can 
be extended to complex-valued functions with a few comments. Let C 
denote the field of complex numbers z = x + iy (with real part x = Rez 
and imaginary part y = gz). As a topological space, C is equal to R?, and 
thus is equipped with the c-algebra $8? of Borel sets. Measurability of 
complex functions will henceforth always mean measurability relative 
to B?. 

Now let f: Q — C be a complex function on a measure space (Q,3(,4) and 
f = u + ù its decomposition into the real part u = Qf and the imaginary 
part v = gf. Since f is just the product mapping u Qv of Q into R? = C 
(see Section 5.2), the (9(-38?)-measurability of f is equivalent to the 
measurability of the real functions u and v. We now say that f is 
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(u-) integrable if u and v are u-integrable. The complex number 
[f du = fudp + if» du (8.1.1) 


is then called the integral of f with respect to u. 

Many important properties of the integral carry over at once. The 
complex y-integrable functions form a vector space (over C), which we 
denote by £!(u,C). Then f > ff du is a linear mapping of £'(u,C) into C. 
The transition to the complex conjugate is accomplished by 


Sfdu = Sf du. (8.1.2) 


Here, f is always u-integrable whenever f is. 
The following theorem gives us information about the absolute value of 
functions f € £!(u,C). 


8.1.1. Theorem. A measurable complex function f on € is u-integrable 
if and only if |f| is »-integrable. Then 


[Jf dul € ff] du. (8.1.3) 


Proof. First, when f is measurable, |f| is also measurable. We need 
only compose the mapping w — f(w) with z — |z| and observe that z — |z| 
is continuous and hence $8?-measurable. The first part of the assertion 
now follows by Theorem 2.4.2 from the inequalities 


Ils lul +l, lwWslf and pl silyl 


for the real and imaginary part of f. 
Inequality (8.1.3) is obtained as follows for integrable f. For every 
z € C, &z < |z|.and hence 


Aff du-f) < |ffdul fl. 


By integrating, we obtain 


[Sf dul? = &(ffdu- [f du) = f&(ff du: f) du 
< [ff du| - fIf| du 


and thus the desired inequality. _ 


Now also the spaces £7(u,C) for 1 € p < œ can be defined in analogy 
to the real case: £?(u,C) is the set of all measurable complex functions f 
on Q for which |f|? is u-integrable. We define £*(u,C) as in the real case. 
We likewise extend the definition of integrability to the case of measurable 
complex functions defined u-almost everywhere on Q. The reader should 
verify that the properties proved in the real case for these concepts carry 
over easily to the complex case. 
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If (Q9, P) is a probability space and X is a complex random variable, 
that is, an 2(-9?-measurable mapping X: Q — C = R?, then 


E(X) = [XqP 


is again called the expected value of X provided X is P-integrable. 

For any two functions fi, f» € £!(A»,C), where ^? again denotes the 
L-B-measure on R?, we can now define the convolution product fi * f; 
by decomposing f; into real and imaginary parts f; = wu; + ùv; (j = 1, 2): 


fi * fo = Ur * us — V * Uo + ilur * Uo + Ue * v1). 


Thus obviously, as in the real case, fı * f» is defined \?-almost every where 
on Q and )?-integrable. The definition is formulated in such a way that 
the properties (3.4.15)-(3.4.17) and (3.4.18), for arbitrary a € C, are 
preserved. 

Finally, if Æ is a locally compact space, then we use C(#,C) [G^(E,C)] to 
denote the vector space of all continuous [continuous, bounded] complex 
functions on E. By €*(E,C) we denote the vector space of all f E e(£E,C) 
whose support S; is compact. Definition 7.4.1 of the support carries over 
verbatim to complex functions. On G*(E,C) we again use 


IfI] = sup |#@)| 
rcE 


to define the norm of uniform convergence. We then let €?(E,C) correspond 
to the closure of €*(E,C) in €*(E,C) with respect to the metric of uniform 
convergence. The functions in €?(E,C) are again said to vanish at infinity. 
The characterization of €*(E) given in Theorem 7.4.8, including the proof, 
carries over to €*(E,C). In particular, we thus have for f E e(E,C) 


f € e*(E,C) e |f| € ev(E. 


(b) Definition and elementary properties of Fourier transforms 


The following considerations involve the set Me = M(R?) of finite 
Borel measures on the Euclidean space R?, p = 1, 2, . . . ; here R? is 
equipped with the usual Euclidean scalar product 


p 


gal 
and the Euclidean norm 
|z| = V «xar». (8.1.5) 
Here z, y denote points in R? with coordinates x = (zi . . . ,2,) and 


QR PUTES 
For every measure u € Me we call 


lul] = f du = a(R?) (8.1.6) 
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the total mass of u. Since ||u|| < + «©, every bounded, continuous, complex 
function f on R? is »-integrable. In particular y — e*<*> is such a function 
for every x € R», since |e*| = 1 for all t € R. 


8.1.2. Definition. The Fourier transform of a measure u € M(R?) 
is the complex function ĝ on R? defined by ’ 


A(x) = fe«",u(dy (x ER»). (8.1.7) 


In probability theory we are interested in Fourier transforms of distribu- 
tions Px of random variables X with values in R?; these are defined 
since Px lies in 9i! (R?). 


8.1.3. Definition. Let X be an (R»,98»)-random variable on a prob- 
ability space (Q,9(, P). Then Px is called the characteristic function of X. We 
also denote it by ex. 


According to the transformation formula (4.3.6), 
Px(x) = E(ei«»X2) (x E R»). (8.1.8) 


The term “characteristic function” will be justified by the Uniqueness 
Theorem 8.2.4, according to which the distribution Px is uniquely deter- 
mined by the characteristic function Px. Moreover, (8.1.7) and (8.1.8) 
define the same mathematical object, since every measure u € 9I! (R») 
is the distribution of a random variable with values in R?, for example, 
the identity mapping of the probability space (R7,387,4) onto itself. 
Examples 


1. For the measure e € M!(R?) defined by the unit mass at a € R?, 
er) m eise (x € R»). 


This is the characteristic function of a (degenerate) random variable which 
is almost surely equal to a. In particular, ê = 1. 


2. For every discrete distribution u = 2, , néa, € 9I (R») (see Section 
4.4, Example 2), we thus have 


"TOES > "m (x € R’). 


Special cases of this are: 


(a) The binomial distribution 8? = 27., a pg’ "ce, on R (0 « p <1, 
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q=1— p,n =1,2,.. .) has Fourier transform 


822) = ` (") pees (ei pen) | AG ER. 
v=0 
(b) The Poisson distribution Ta = Z5... inue 


e, with parameter a > 0 


has Fourier transform 
eo 
One P 
Ra) = e y TOTS AO (x E R). 
n! 
n=0 


It is not accidental that the Fourier transforms computed so far are all 
continuous. Indeed, we have: 


^ 


8.1.4. Theorem. For the Fourier transform f@ of every measure 
p. € ST (R7»), the following properties hold: 


(a) ĝis uniformly continuous on R”. 


(b) l&(x)| S ||u]| = &(0) for all z € R». 


(c) fi is positive-definite, that is, for arbitrary n E N, points zi, . . . , 
£n € R?, and complex numbers y . . . , An; 
dfs — 2) = 0. (8.1.9) 
st=1 AE 


Proof. (a) Since u is regular, for every e > 0 there exists a compact 
set K C R? such that u( CK) < e. Then 


a = sup (ly: y € K} < +% 
and hence, by the Cauchy-Schwarz inequality, 
|Xz: — zyy2| S | — zi ly] S alee — a1 
for all y € K and arbitrary zi, x? € R?. Since 
leis» — ei«nu»| = |1 — eise? |, 
there is a ô > 0 such that 


leise» ==. e<t2u>| S € 


for arbitrary zi, 2 € R? with [x1 — x| € ô and for all y € K. But then 
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the following inequalities, that are valid for all such pairs zi, x» € R?, 
show the uniform continuity of à: 


la) — ael < f Je — eic |u(dy) 


js lev» — eie» [u(dy) -F jo |e«*v». — eicnev? |n (dy) 
eu(K) + 2u( CK) < e(lu|| + 2). 


IIA 


(b) This follows from (8.1.3). 


(c) On the left side of inequality (8.1.9) we have a u-integral with 
integrand 


Victim 


Ill 


f(y) 


=1 


WALT do» s P Neige» 


3 


Hence f 2 0 and thus ff di Ek sl 


As etts Y> l. 


Remark. A well-known theorem of S. Bochner [25] tells us that, con- 
versely, every continuous positive-definite function p: R? C is the 
Fourier transform of a measure u € Me(R?). However, we do not make 
use of this characterization of Fourier transforms. 

Now we study.the behavior of the mapping u — £ from 9I*(R») into 
e*(R?,C) relative to several operations in 9t*(R7), in particular to the 
convolution *. 


8.1.5. Theorem. For arbitrary finite Borel measures u and v on R”. 


(a) Wedel roe 

(b) ah = ap (a € R4). 
DANS pe dA 

(c) bev = pd. 


(d) For every linear mapping T of the vector space R? into itself and its 
transpose mapping T", Jh 
TO) = pe Te} (8.1.10) 


1 If we represent T (by fixing a coordinate system) by a matrix, then, as is familiar, 
T' is described by the transpose matrix. 
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(e) For the reflection through the origin x > S(z) = —z, 
VA 
Su) =A = poS. (8.1.11) 


In particular, along with g, the conjugate function f is also the Fourier 
transform of a measure from M (R»). 
(f) For every translation T,(x) = x + ain R?(a € R»), 


AN 
T alu) = & ft. (8.1.12) 


Proof. Equality (a) follows from the observation that for every 
bounded, measurable, complex function f on R?: f fd(u +») = f f du + 
f fdv. We obtain (b) similarly. 


(co) By (3.4.5) we have [fd(u*v) = ff f(u + 2u(dy)v(dz) for every 
u * v-integrable, real dnd hence also for every u * v-integrable, complex 
function f on R”. If we choose f(y) = e*<*¥> for given x E R?, then (c) 
follows, since ei**vt*» = ei«sv»giens, 


(d) First of all T is continuous, and is thus a measurable mapping; there- 
fore, along with u, T'(u) also lies in Me. By the Transformation Theorem 
we have Ng aT (us) = Tg o T dy for every T(u)-integrable f, and in particu- 
lar for f(y) = eix*»^(x c R»). If we now take note of the equality 
<2,T(y)> = <T*(x),y>, then we have the assertion: 


T(u)(z) = fe"»,(dy) = fei<™@>u(dy) = g(T'(z)). 


(e) Obviously, A(x) = a(—z) = &(S(x)). The rest follows from (8.1.10) 
because & = S. “i 


(f) By (3.4.10), Ta(u) = e x u. Thus (8.1.12) follows from (c). J 
For product measures we have: 


8.1.6. Theorem. For the product u & v of measures u € Me(R?”) and 
v E 9I (R9), the Fourier transform is given by 


"TON, = &(x)(y) ^ ((my) € Rt). (8.1.13) 


Proof. The scalar product of the (p + q)-dimensional vectors (x,y) 
and (2,2) from R? X R! = R?t is given by <2,z> + <y,z’>, that is, 
by the sum of the corresponding p- and g-dimensional scalar products. 
Hence y 13 v(zx,y) = ffe "ew > u(dz)v(dz') = g(x)s(y). J 


In particular, we now have available the following properties of charac- 
teristic functions of random variables. 
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1. The characteristic function of the sum of independent (R»,9»)- 
random variables A5. <> Ants given by 


9X... X, = PX, ` Orco G C ex... (8.1.14) 


2. For every linear mapping T of the vector space R? into itself, every 
a € Rr, and every (R»?,$»)-random variable X, 


PatTox(X) = ei&*"»px(T'()) (x € R»). (8.1.15) 
In particular, 
e-x(x) = ex(—2) = gx(t) | (x € Rr). (8.1.16) 


Here 1 follows from Theorem 8.1.5 (c) since, by Theorem 5.3.4, Px, * - 

* Px is the distribution of Xı + -- - + Xan. (8.1.15) and (8.1.16) 
follow from (8.1.10)-(8.1.12). We need note only the transitivity of image 
measures: If Ta denotes the translation x — x + a, then T,» To X(P) = 
T.(T(Px)) = e& * T(Px) is the distribution of the random variables 
a+ To X. By Example 1, however, &(z) = eie. 

In particular, all measures u = f^? lie in M(R?), where f = 0 is an 
integrable function relative to the L-B-measure ^? on R?. By Theorem 
2.9.3, the Fourier transform of uis given by a(x) = [ei f(y) A?” (dy). We 
shall soon see that it is useful to study this integral as a function of x 
also for complex )?-integrable functions. Therefore we define: 


8.1.7. Definition. For every -integrable function f: R? — C, the 
function f: R? — C defined by 


f(x) = fei fly) (dy) (8.1.17) 


is called the Fourier transform of f. 


Thus we have 
7N AN ZN 
f = ut — u Ne + iota =o N), (8.1.18) 


where f = u + iv is the decomposition of f into real and imaginary parts 
and u = ut — u^, v = vt — v- are the decompositions of the latter into 
positive and negative parts. From this observation we obtain many prop- 
erties of the mapping f — f from the corresponding properties of the 
mapping u — A. 


8.1.8. Theorem 
1. For any two functions f, g € $&!(A»,C): 


(a) fis uniformly continuous. 
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(b f—f is a linear mapping (over C). 
AN 
(0 f*y=f:9. 
nN l Lra 
(d) f = f*, where f* denotes the function f*(x) = f(—2). 


2. For every pair of functions f € £1(d”,C) and g € £!(^*,C), f @ g lies 
in £!(A?*«,C) and we have 


f@ yxy) few (€ R). (8.1.19) 


Proof. For 1: (a)-(c) are obtained directly from the above observation 
and Theorems 8.1.4 and 8.1.5. Property (d) follows from the definition 
of f, when we take into account the reflection invariance of A? proved in 
Section 1.7, Example 4. 

For 2: Since 8? Q Pı = 38»*«, the function f Q g is $»*«-measurable. 
By Fubini's Theorem, 


fif & gl darts = ffl de - fg] d < +o, 


whence follows the \?+?-integrability of f & g. The rest of the assertion 
follows either by application of Fubini's Theorem to the integral defining 


DY or from (8.1.13) via the above observation. J 


Examples 


3. The Fourier transform of the normal distribution ve, = Ja, and 
thus also of its density ga,.2 is given by ^. 


Pa,r(2) m Ga,02(X) = giar orale (a,x c R,c > 0). 


It suffices to consider the case a = 0, c = 1 of the standard normal dis- 
tribution vo1, since va, = T(vo1) is the image of vo; under the mapping 
T(x) = a + ox and since we have available the transformation properties 
of Theorem 8.1.5 (d) and (f). Thus we have to compute the integral 


Ew 
fo (x) = vd giVe-v*12 dy. 
T — oo 


Using function theory, we can do this as follows: 
If we set z = y — iz, then 2ixy — y? = —z? — z? and thus 


1 
boilt) = — en l e-*4? dz, 
V 2a quae 
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where we have to integrate along the line G = (y — ir: y € R}. But 


+0 
ip e? dz = Í e? dy = V w% 
and hence 
Poi(x) Yon 
Indeed, since z — e~*/? is an entire holomorphic function, by integrating 


along the oriented boundary of the rectangle with vertices y, —y, —y — 72, 
y — iz, we obtain the equality 


if e? dz = Cue e- 12 dz EECA e? dz + jks €? dz. 
E ya =y y — ır 


The second and third integral on the right side have absolute values that 
are bounded by a multiple of e? due to the constant length |z| of the 
integration path, and thus approaches zero as y > +. The asserted 
equality then follows from the last equality by passing to the limit 
U> Fo. 


4. For the Cauchy distribution ya = Cad! (a > 0), we have 
Valt) = 6 (m)-— 6:9. 


It suffices to consider the case a = 1, since x — ax maps the measure yı 
onto ys. Thus we have to compute the integral 


1 +o ety 
h() = = — — dy, 
41(x) =f 1 = y? y 


Since 7:1(—2z) = 41(x) and 41(0) = 1, we can assume x > 0. Again we use 
function theory. The meromorphic function 
E eit eit 
Ca E eI NG TEE 
14-2 (¢—1)(2 +2) 


has a pole (of first order) in z = 7 with residue (1/27)e~*. Then by the 
Residue Theorem 


ap ety 3 
ie a+ | cae di 


ifr > 1 and H, denotes the semicircle arc oriented from +r to —r which is 
intersection of the half-plane Qz > 0 and the circle of radius r and center 0. 
Then obviously, for all z € H,, 
ei 1 
r? — I 


1+ 2? 


Therefore, as r — +, the integral over H, approaches zero, and we 
obtain 43i1(z) = e-*. 
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PROBLEMS 


1. Let f: Q — R?” be a mapping defined on a measure space (Q,%,u). 


Denote by fi, . . . , fp the p real components of f. The mapping f is 
called u-integrable if all components fi, . . . , fp are u-integrable; 
the p-dimensional vector (ffi die tas Bes du) is then called the 


u-integral of f and denoted by [f du. Prove that Theorem S.1.1 
generalizes to such vector-valued functions f if |f| denotes the mapping 
w — |f(w)| where |x| is the Euclidean norm of a vector x € R?. 
2. Let X be a real random variable satisfying P{X —1] =... = 
P(X =n} = 1/n. Prove that 
et eine -— 1 
g——:— 
n^ UI 


is the characteristic function of X. 
3. Prove that 
(x1, . . . jn) — (piei + + + + + puel) 


is the Fourier transform of the multinomial distribution of Section 4.4, 
Problem 4. 
4. Prove that 


is the Fourier transform of the so-called rectangular (or uniform) 
distribution pa of the interval [—a,a]: 
1 


p, E = Mies ei 
p passt ,a] 


1 
») 
where a > 0. 


5. Prove that the following equality is valid for all measures u, v € 
me(R?) and all points x € R?: 


fe-i<=w>p(y)v(dy) = foly — z)u(dy). 


6. Prove that formula (8.1.10) also holds for linear mappings T: R? — R*. 
Deduce from this (and prove also directly) that 


uSwx0)-A() (x ER”) 


holds for all measures u € 9t (R») and v € 93! (R9). 

7. Let X (resp. Y) be a (R?,%”)-(resp. (R5,382)-) random variable on a 
given probability space. Prove: X and Y are independent if and only 
if their characteristic functions ex, ey and the characteristic function 
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exgv Of their joint distribution satisfy 


exer(z,y) = ex(x)ev(y) 


for all x € R? and all y € R«. 

8. Let X and Y be normally distributed real random variables on a 
probability space. Use Problem 7 in order to prove that X + Y and 
X — Y are independent random variables. 


8.2 UNIQUENESS AND CONTINUITY THEOREMS 


The next theorem, due to Riemann-Lebesgue, whose proof is preceded 
by a lemma, paves the way to all deeper properties of Fourier transforms, 
and in particular to the two main theorems, namely, to the Uniqueness 
and Continuity Theorem. 


8.2.1. Lemma. For every f € £'(A»,C), the function $: R? —R, 
defined by 


b(t) = fifa + t) — f(x) v (dz) 


is continuous at the point t = 0. 


Proof. Since X? is translation-invariant, the integrability of f implies 
that of x — f(x + t) and hence the existence of the integral (t). We 
prove the continuity of & at t = 0 in two steps. 


step 1: Let f € €*(R»,C) and let S, be the compact support of f. 
Since f is uniformly continuous, for every e > 0 there is a 6 > 0 such that 
|f(x1) — f(z2)| < e for all zi, x2 € R? with |x1 — z»| < ô. Further, if U 
is an open, relatively compact neighborhood of S; (say a sufficiently large 
open sphere), then we can take 6 so small that with every point x € Sy, 
the open ball of center x and radius 6 is contained in U. Therefore, for 
every t € R? with |t| < à, we have, on the one hand, |f(x + t) — f(x)| < e 
for all x € R?; on the other hand, the support of x — f(x + t) is equal to 
S, — t and is therefore contained in U. We thus obtain 


&() = [ [fe + — fG)v(az) s exu) 
for all t satisfying |t| < 6, that is, since 4(0) = 0 we have the desired 
continuity at the origin. 


step 2: Now let f € £1(d?,C) be arbitrary. By applying Corollary 
7.5.5 to the real and imaginary part of f, for each e > 0 we obtain the 
existence of a g € C(R?,C) such that [|f — g| dà»? € e. Since ^ is 
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translation-invariant, we also have 
Sise +) — g@ + pe(dz) = flf glar se 
for all t € R?. Using the triangle inequality, we obtain the majorization 


e(t) € fifi +t) — gle + ilde) + flol +) — gG)ve(da) 
+ flg — f| d^» € 2 + flg(z + t) — gx) (dz). 


According to Step 1, the remaining integral approaches zero as t > 0. 
Hence it follows that lim (t) = (0) = 0. J 
t—-0 


It is now a simple matter to obtain the next theorem, which represents a 
sharpening of the uniform continuity of Fourier transforms f of functions, 
which we have already shown. 


8.2.2. Theorem (Riemann-Lebesgue). The Fourier transform f 
of a function f € £!(A»,C) vanishes at infinity. 


Proof. Since e^ = —1 and because of the translation-invariance of 
^? we have for every x ¥ 0 in R7: 


fx) =2 3 Í eic? f(y) (dy) 
-5 | emiten - 5 f Eyre) 


1 T 
= al e'<*¥>F(y) NP (dy) = al LA | (v + — 2) A” (dy) 


|z|? 


1 T 
= al ei <ty> Eo — aC + cn?) | A? (dy). 


lf(z)| s Se Sj im^) — f(y) | X (dy), 


Hence, 


and the assertion follows from the above lemma, since |(z/|x|?)z| = /|z| 
approaches zero as |z| > +œ. J 


Remark. We have é = 1. Thus the Fourier transform of a measure 
u € 9I (R») does not vanish at infinity in general. 


8.2.3. Corollary. The set & of all Fourier transforms f of functions 
f € €'(A»,C) is dense in ©°(R?,C) relative to uniform convergence. 
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Proof. By Theorems 8.1.8 and 8.2.2 $ is a linear subspace of ©°()?,C) 
which, along with any two functions f, g, contains the product f-$ 


and, along with every f, contains the conjugate function f. Thus F is a 
self-adjoint subalgebra of C°(R?,C). By the Stone-Weierstrass? Theorem 
we need to show only the following properties (a) and (b): 


(a) For every zo C R? there exists an f € F with f(xo) = 0. Indeed, 
let V be a compact neighborhood of xo and Jet f be the function 


q— Lye: efe 


lying in £1(?,C). Then f(zo)) = X»(V) > 0 since V contains, for example, 
a nonempty half-open interval. 


(b) For every pair of points zo Æ yo from R? there exists an f € 9 such 
that f(ao) = f(yo). Indeed, choose a € R such that ei€«-ve^? < 1 for 
£o = a(xo — yo). Then there is a compact neighborhood W of zo with 
e<> £ gi«wv? for all y € W. The function 


f(x) = lw(x)(e-i€**» — g-icw22) 


lying in £1(?,C), has a Fourier transform for which 


f(x) — fluo) = "E lees? — eicves2|*Ne(dy) 2» 0. 
The main theorem of the theory is now easy to prove: 


8.2.4. Theorem (Uniqueness Theorem). The mapping u £f of 
me(R?) into e'(R»,C) is injective. Thus every measure u € Me(R?) is 
uniquely determined by its Fourier transform f. 


Proof. We have to prove the equality of any two measures u, v € M 
for which & = ô. Let f be arbitrarily chosen in £1(\?,C). Then, by Fubini’s 
Theorem, 


ff du = fSe<=«>f(y) o(dy)u(dz) = f fe<=9>f(y)u(de) (dy) = faf dv 
and correspondingly, 


[f d» = fof dn. 


Thus we obtain 
Sf du = Sf dv, for all f € 9. 


? It suffices to note that F contains the real part Af = 4(f + f) and the imaginary 
part sf = (1/2i)(f — f) of every function f € F as elements. Application of the Stone 
Weierstrass Theorem to the set of real functions in $ then yields the assertion (see 
footnote 15 in Chapter 7). 
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Since u and v are finite measures, it follows by Corollary 8.2.3 that 
fhdu = fhdv, | forall h € ©(R?,C). 


We need note only that for any two functions p, q € C*(R»,C), the 
inequalities 


fp du — fadu| € fip — al du S lip — all - Ilall 


and the corresponding ones for v hold. Since @°(R?) C €'(R»,C) C 
e*(R»,C), it follows that fhdu = fhdv for all h € @*(R”). But then 
u = v by Theorem 7.5.4. J 


Thus every problem on finite Borel measures on R? can be transformed 
into an equivalent problem with regard to their Fourier transforms. 
Herein, and in the property of transforming the convolution product into 
the ordinary product, lies the significance of Fourier transforms. At the 
same time, this justifies the terminology “characteristic function" for the 
Fourier transform of the distribution of a random variable. The distribu- 
tion of an (R?,B”)-random variable X is uniquely determined by the char- 
acteristic function ex. The examples below are the first illustrations of 
these statements. 

However, we first prove the analog of Theorem 8.2.4 for functions. 


8.2.5. Corollary. For any two functions f, g € !(A»,C): 


Jesolo \?-almost everywhere. 


Proof. First, let f and g be real-valued. Then by (8.1.18) the measures 
(ft + g-)d” and (gt + f)? have the same Fourier transform; thus, 
(ft + 9-)\? = (gt +f). Then, by Theorem 2.9.4, the functions 
f* +g° and gt +f-, and hence also f = ft — f- and g = gt — g- are 
\?-almost everywhere equal. The | general case follows. from the remark 


that f = ĝ implies the equalities ar Rg and aj = 59. Indeed, 


&j-2UD md W-lq-b 
as well as 
fe) = f-2» 
for all z € R? by (8.1.17) and the reflection-invariance of \?. Correspond- 


ing formulas for g prove this remark. — | 


Examples 


1. The Fourier transform ĝ of a measure u & 9Q*(R») is real-valued if 
and only if p is invariant with respect to the reflection S through the 
origin. If à is real, then (8.1.11) tells us that S(u) and u have the same 
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Fourier transform. But then, by Theorem 8.2.4, S(u) = u. The converse 
follows directly from (8.1.11). 


2. The Fourier transform f of a function f € £1(d?,C) is real-valued if 


and only if f(x) = f(—2) holds \?-almost everywhere on R». This follows 
analogously from Theorem 8.1.8(d) and Corollary 8.2.5. 


3. In Section 5.3, Remark 3, we mentioned that the sum X + Y of real, 
independent random variables X and Y with the Cauchy distributions 
Ya and yg respectively (a, 8 > 0) has distribution y54,s. The proof can 
now be presented as follows: By (8.1.14) and Section 8.1, Example 4, 


ex.r(x) = ex(x)ev(z) = e C64 (r € R). 


But this 1s the characteristic function of a random variable with distribu- 
tion ya4s. Therefore, y4,5 is the distribution of X + Y. In other words, 
Ya LYA = Ya Fh: 


4. The reader can prove, analogously to Example 3, the result from 
Section 5.3 according to which the sum of independent, normally dis- 
tributed random variables is normally distributed. 


5. Example 3 shows that the validity of Px+y = Px » Py for real random 
variables X, Y in general does not imply their independence. We need 
only choose X = Y and Px = yı. 


The second main theorem of the theory concerns continuity properties 
of the mapping u — fi. We facilitate the proof by a le1..ma which sharpens 
Theorem 8.1.4(a). 


8.2.6. Lemma. For every weakly convergent sequence (un)nen in 
9r*(R»), the sequence (fn)nen of Fourier transforms is equi-uniformly 
continuous on R».? 


Proof. It suffices to modify the proof of Theorem 8.1.4(a) slightly. 
Let u be the weak limit of (un). By (7.7.5), for every e > 0 there is a 
u € €(R») such that [(1 — u) du < « and 0 € u € 1. The proof of 
Theorem 8.1.4(a) tells us that then there is a ô > 0 such that 


lei«z v7 — ei€zvv? | ES: € 


for all zı, x2 € R? with |x; — x| € 6 and all y from the compact support 


S, of u. Now lim fa — u) du, = fa — u) dy, and thus there is a natural 


no 


3 This means: For every e > 0 there is a à > 0 such that là&(zi — fn(z2)| S € 
for any two points zi, z2 € R? with |z1 — x2| < 6 and every n EN. 
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number no such that fa — u) du, < e for all n 2 no. For all pairs 
tı, 25 € R? with |x: — x2| S ô and all n 2 no, it then follows that 


[leen — ei«*vv? |u,(dy) 
[lese — ev |u(y)us(dy) 
+ flens — ec |Q — u(y))us(dy) 
en(Su) + 2f — u) dun < ellul] + 2). 
But, due to the weak convergence, the sequence (||u,||) is convergent and 


hence bounded. From this and from Theorem 8.1.4(a), applied to the 
Fourier transforms £i, . . . , fis, -1, the assertion now follows. — .] 


[gi (21) "x fi. (x2) | 


IA 


IA 


8.2.7. Theorem (Continuity Theorem). For every sequence 
(un)nen in Me(R?): 


1. If (un) converges weakly to u € Me(R?), then the sequence of Fourier 
transforms (fn) converges uniformly to @ on every compact subset of R?. 


2. If the sequence (f,) of Fourier transforms converges pointwise to a 
complex function ¢ on R? continuous at x = 0, then ¢ is the Fourier 
transform of a (uniquely determined) measure u © M(R?”) and the 
sequence (un) converges weakly to y. 


Proof. 1. Each of the functions y — e*<*-¥> lies in C?(R?,C). For every 
x € R», lim f,(r) = f(x) by the definition of weak convergence. Thus, 


na 0 


for every e > 0 and x € R” there exists a natural number n, such that 
là&(x) — &(x)| S e foralln 2 nz. By Lemma 8.2.6 the sequence (fn — &) nen 
is equi-uniformly continuous on R?. Thus, there is a 6 > 0 such that for 
zi X2 € R? with |x; — z| € $ and all n = 1,2, .~. . 


|An(a1) — &(xi) — fis(22) + &(22)] S e. 


This now implies |à,(y) — &(y)| € 2e for all points y in the open ball 
K;(x) of center x and radius 6 and all n = n;. Hence the assertion follows, 
since every compact set K C R? is covered by finitely many balls K;(x!), 


. . . , Kolsi) with centers v1, . . . , 27 € K, and thus 
lan(y) — &(y)] € 2e 
for all y € K and all n = max (na, . . . jn). 


2. First we reduce the assertion to the following auxiliary assertion: 
(H) Let (vn) be a sequence in Me(R?) converging vaguely to a measure 

v € M(R?), for which the sequence (7,) converges pointwise on R? to a 

function y: R? — C continuous at x = 0. Then lim ||v.|| = ||v||. 
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To prove “(H) = (2),” let (un) be a sequence in Me with the properties 
noted in 2. Then ¢(0) = lim f,(0), and thus, by Theorem 8.1.4(b), the 
sequence (||un||) converges and hence is bounded, that is, 


a = sup ||u,]| € +. 
nEN ; 
By Corollary 7.8.3 and Theorem 7.8.4 there now exists a vaguely con- 


vergent subsequence (vn) of the sequence (un); let the vague limit be 
v € Me. Since (vn) also converges pointwise to e, (H) yields the con- 


vergence lim ||»,|| = ||»||. Thus by Theorem 7.7.7, (vn) is weakly convergent 
to v, that is, by 1: ?(x) = lim #,(x) = e(x) for all z € R”. Thus, e = ? 


and hence, by the Uniqueness Theorem, v is uniquely determined. All 
vaguely convergent subsequences of (un) therefore have the same limit v. 
Since (un) is a sequence in the compact metrizable space of all u € M 
with ||u|| € e, we have the vague convergence of (un) to v and since 


lim ||u|| = ¢(0) = (0) = |lvll, 


we also have weak convergence to v. But this is the assertion of 2. 

Now we prove (H): The sequence #,(0) = ||v,], n —1, 2, .. . is 
convergent and thus bounded: o = sup ||».|| € +œ. Since [|?,| < ||v.||, 
the sequence (fn) is uniformly bounded on R? by a, and thus |p| S a. 
Asin Section 7.7, Example 4, let K = 0 bea function from £!(A») satisfying 
[K dw» = 1. We again set K,(z) = r?K(rx) for each r > 0 and 


Sen me IL 3» Pp; S, = K, * 3, S, = K.*o 


for allr > 0 and n = 1, 2, ... . These definitions are meaningful since 
?, and ? are bounded continuous functions and since y is bounded and %7- 
measurable as the limit of (n). According to Section 7.7, Example 4, and 
Theorem 7.7.10, the continuity of ? and y at x = 0 implies 


lim s,(0) = #(0), lim S,(0) = v(0). 
ro +o 


rote 


In fact, we have lim K,)\? = eo in the sense of the vague topology. Since 
r— +0 

K,” and e, are all probability measures, K,\” converges to eo relative to 

the weak topology. For every bounded, Borel-measurable function f con- 


tinuous at z — 0, we have, by Theorem 7.7.10, 


lim ff) Ky) (dy) = f(). 


By choosing the functions y > #(—y) and y — ¥(—y) for f, we obtain the 
asserted results. 


260 CONTINUATION OF MEASURE AND INTEGRATION THEORY 


According to Fubini, 


s..(0) = f K.G)s$.(—u)v (dy) = JK.C-y)ss(y) v (dy) 
= JJK {yeis > v. (dz) M (dy) = fa. dv, 


and analogously 
s(0) = fq. d», 
where 
qz) = fei<u> K,(—y) (dy) (z E R?). 


For every r > 0, q, is then the Fourier transform of the function y — 
K,(—y), that is, by the Riemann-Lebesgue Theorem, q, € C°?(R?,R). 
With the help of Theorem 7.7.5 we now obtain 


lim $,.,(0)-="s,(0) (r > 0). 


no 


Since (n) converges pointwise to y and |?,| € æ for all n, it follows from 


s,,(0) = f K,(— y). (y) (dy) 


and the Dominated Convergence Theorem that 


lim s,,(0) = fK,(—y)v(y)v(dy) = S,(0) ri 0): 
Thus s,(0) = S,(0) for all r > 0; passage to the limit r — +o yields 
£(0) = ¥(0). But then 


lll = 2(0) = ¥(0) = lim #,(0) = lim |»,]. J 


no n— o 


Since the weak topology on Me(R?”) is metrizable by Section 7.8, Remark 
2, Theorem 8.2.7 says in particular the following: On the set MÈ of all 
Fourier transforms f of measures u € Me(R”), the topology of pointwise 
convergence coincides with the topology of uniform convergence on compact 
subsets. The mapping u  & is a homeomorphism between S (equipped 
with the weak topology) and Su (equipped with the topology of pointwise 
convergence). 


PROBLEMS 


1. Prove or disprove: For every vaguely convergent sequence (un) in 
3I (R^) the sequence (An) of Fourier transforms converges pointwise 
on R°. 

2. Consider the sequence (pn)nen of rectangular distributions defined in 
Section 8.1, Problem 4. Prove: (pn) converges vaguely and (fn) con- 


4. 
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verges pointwise. Can the Continuity Theorem be applied to this 
situation? 


Define as usual the Gamma function T on ]0, 4- o[ by the integral 


T(t) = | ? q'71e da. 
'Then 


L1 553 
"OEC mo” Gs N) 


0, TU 
is for each ¢ > 0 the A'-density of a probability measure yu, = f;M. 


m is called the Gamma distribution with parameter t. 
(a) Calculate the Fourier transform of u. 


(b Let Xi, ... , X, be n independent real random variables with 
vo,1 aS common distribution. Prove that the distribution of 
X? + coo X? has the function 
Baa Neat) 2. xr > 0 
n 
(x) = | 2r (- 
g»(x) (5) 
0, ea) 


as Al-density. The probability measure g,A! is called the x?-d?s- 
tribution with n degrees of freedom. 
Let (Xn)nenw be an independent sequence of real, centered random 
variables on a probability space. Prove: The sequence (X,) satisfies 
the weak law of large numbers if and only if the sequence of functions 


ZU (n € N) 
er 


converges pointwise to 1. Apply this result to the sequence (Xn) 
studied in Section 6.4, Problem 1 and prove that for ^ 2 1 the weak 
law fails. This proves again the result of Section 6.5, Problem 2. 
Let K = 0 be a function in $!(à») satisfying [K d^? = 1. Define 
K, for r > 0 as in Section 7.7, Example 4. Prove that 

lim (K, *4)A9 = p 

r0 
holds in the weak topology for all u € 9X*(R»). [Hint: Section 7.7, 
Example 4 treats the special case u = ec. Observe that 


x — Jf(e + y)u(dy) 
is in €*(R») for all f € e*(R»).] 
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6. A function ô = 0 in £1(X”) is called a convergence factor on R? if its 
Fourier transform satisfies f dì? = 1. 
Prove that x — (27)-7e-*!? is a convergence factor on R?. 

7. Let 6 be a convergence factor on R?, and let u be in 9t (R7). Denote 
by s,(u) the function 


eee aap ( &(y)v(dy (r > 0). 


(a) Prove the following inversion formula: 


lim s,(u)A? = u 
r0 
in the weak topology. [H nt: x — $(—2) is a function K with the 
properties mentioned in Problem 4. The result is a consequence 
of Section 8.1, Problem 5 and the above Problem 4.] 
(b) Deduce from (a) a new proof of the Uniqueness Theorem 8.2.4. 


8.3 DIFFERENTIABILITY OF FOURIER TRANSFORMS 


The following lemma is a preliminary to answering the question of 
differentiability of Fourier transforms: 


8.3.1. Lemma. Let (0,965) be a measure space, U an open subset 
of R or C, and f: U X Q > C a function with the following properties: 


A 


(a) w—f(t,w) is y-integrable for each t € U; 


(b) t—f(t,w) is differentiable in to € U for all w € Q; the derivative is 
denoted by f’ (to,w) ; 


(c) there exists a u-integrable function k = 0 on Q such that 


ftw) [s f(to,) 
[esie S hlo), 


for all w € Q and all t E U \ {to}. 
Then the function e: U — C defined by t > dito w)u(dw) is differentia- 
ble in £o, w — f’(to,w) is u-integrable, and ¢’(to) = Sf’ (to) (do). 


Thus under the given conditions we can differentiate under the integral 
sign. 
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Proof. Let (t,) be a sequence in U with lim t», = to and t, Æ to for all n. 
Then for every n € N, 


= fto) T f(to,o) 
gn(w) = eas 


is u-integrable and 


lim galw) = f'(to,o), for all w € Q. 


Moreover, |g,| € h for all n. By the Dominated Convergence Theorem 
w — f’(to,w) is u-integrable and 
lim fg. du = ff'(too)u(de). 


This proves the lemma since 


TS 
J ondu = Se = 20) OE oe NH 
Go i] 


Remark. 1. Condition (c) is satisfied if t — f(t,w) is differentiable in U 
for every w € Q, if for every point t € U the line connecting t and to lies 
in U, and if there exists an h € £!(u) such that 


|f(5o)| S h(w), X forall (£o) € U XQ. 


To see this, it suffices to apply the Mean Value Theorem of differential 
calculus. 
Now we return to the study of Fourier transforms. 


8.3.2. Definition. Let u be a (not necessarily finite) Borel measure 
on R?” and let kj, . . . , kp be integers 20. Then if 


unit Copas cog) Sie on ar ee 


is u-integrable on R?, 


Mi,,...4, = IE T Sos TA, 
is called the (kı, . . . ,k,)th moment of u and kı + * * * + kp its order. 
We then say that the (ki, . . . ,k;)th moment of y exists. 


Examples 


1. There is exactly one moment of order 0. It exists if and only if u is 
finite, and Mo,...,0 = llall. 


2. Let X;,..., X, be real random variables on a probability space 
(Q,96,P) with joint distribution u = Px,s,...,ex,. By the Transformation 
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Theorem for Integrals, the (ki, . . . ,k;)th moment M;, ..., k, Of u exists 
if and only if the random variable X{'- . . . : X? is P-integrable. 
Then we have 


k = REX. et tel me EX). 


We also call M;,,... x, the (ki, . . . ,kp)th (mixed) moment of the random 
variables X1, . . . , Xp. 


3. In the case p — 1, the existence of the kth moment M; of à measure 
u € M(R) implies the existence of all moments M; with O < l < k. 
Indeed, |x|! € 1 + |x|* for all x € R. 


4. All moments M,, k —0, 1, .. . , exist for the standard normal 
distribution vo1 = go1^ (and thus also for vae, with arbitrary a € 
R, o > 0). 

For even k, integration by parts (see the derivation of (4.4.11) and 
(4.4.12) on p. 145) yields the recursion formula M2, = (2n — 1) Ms 
and hence 


Mm (2n-cc IUUD eed se cest CO emm v 


For odd k, taking note of Example 3, we obtain the existence of M}. Due 
to the symmetry of go,1, 


Mny = 0 = 0) UE SES 


5. In the case p > 1, we cannot generally derive the existence of all 
moments Mi,..., i with 0 <1; € k; from the existence of a moment 
My, ...,&,. In fact, let Q = J0,1[, 9( = BQ) — Q/Y38!, and P = dj. For 
numbers o, 8 satisfying 0 < a < 8 < 1, we consider the following ran- 
dom variables X = 0, Y 2 0 on (Q,Y,P): 


1 1 

ZONES D T. 1 
X(w) = 1o quu Yio) 241—o poses 

0, pascua 0, 0O<wkXa. 


Then X - Y is integrable, but neither X nor Y is. For the joint distribution 
u = Pxgy on R?, the (1,1)th moment exists, but Mo: and Mi do not. 
Ifa < B, then M4 2 0, and if a — B, then Mii = 0. 


8.3.3. Theorem. Let „u be a finite Borel measure on R? for which all 
moments M;,...; withO S l;  k; (j =1, ... , p) exist. Then for all 
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such l, . . . , lp, the partial derivative 
Ait Hug 
Di, $359 LE = "m 1 
Ces = OKAY 


of the Fourier transform £ exists on all of R?; we have 
Di... f(x) = tat te feimeyh. liie y?u(dy) (8.3.1) 


and in particular 
(8.3.2) 


Each of the derivatives is uniformly continuous and bounded on R?. 


Proof. For every y = (yy . . . ,yp) € R?, z > f(x) = eic*v? is dif- 
ferentiable arbitrarily often on R?. We have 
Di, .. a fut) = fu(m)Gy)h- . . . + (typ) 
and hence 
Diy fes dal ue lya: 
Forallh, ...,lpwithO $1; S kj, y — Dh... 1,f,(&) is then u-integrable; 


moreover, |yi^ - . . . * |y;l'» is a w-integrable majorant of Di, ... 4, f, (x) 
and independent of x. Thus, by Lemma 8.3.1 together with the remark 
following it, 


z — JDi,,...,1,fy(x)u(dy) 


is differentiable with respect to x,, provided l; < k;. The partial differentia- 
tion ean be put under the integral sign. Using induction, we now obtain 
the existence of all partial derivatives Dz,,...1,8, 0 € 4 S kj, as well as 
formulas (8.3.1) and (8.3.2). The rest of the assertion follows from 
Theorem 8.1.4 and the remark that the integral in (8.3.1) is the difference 
of the Fourier transforms of measures with densities 


Poly. aye Vand yo duo ves zl 


with respect to u. J 


8.3.4. Corollary. If the moment M, of kth order exists for a finite 
Borel measure y on the real line R, then f is k times differentiable. The 
kth derivative a is uniformly continuous and bounded. We have 


à9(0 = M, (k=0,1,...). (8.3.3) 


This follows directly from Theorem 8.3.3 if we take Example 3 into 
account. The case k = 0 was taken care of by Theorem 8.1.4. 
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A typical application of this corollary is the following: Let u € WM(R) 


be such that M; exists for some k — 1,2, . . . and hence Mi exists for all 
l such that 0 < l < k. Then fi has the Taylor expansion 


p(x) = D (ex)! M,+ jl 6 hs £C (t) dt 


T (ED 
1-0 
k 
(iz)! A Ti ; 
= eS Mi + P USES [a9 (t) — a*(0)] dt. (8.3.4) 
i=0 


For the remainder term R(x) of the above Taylor expansion we then 
obtain, in particular, the bound 


|z|‘ 


Əl sS. ‘sup [n Ox) —- 099 (0)| E (8.3.5) 
MESES k! 


Finally, we investigate the question under which conditions the Fourier 
transform (defined on R?) of a measure u € 9I*(R») can be extended 
holomorphically to the complex p-dimensional space C7. That this is not 
always possible is shown by the example of the Cauchy distribution ye 
whose Fourier transform, as a function on R, does not have a derivative 
at x = 0. 

We decompose every point z = (zı, . . . ,2,) € C? into its real and 
complex components x C R? and z' € R?, respectively, so that z = 
x + ix’ and therefore x and x’ are the vectors of the real and imaginary 
parts respectively of all coordinates zı, . . . , 2,. More precisely, we 
investigate the question of when £ can be extended holomorphically onto 
a "strip" of the form 


Sas [pep © QM rte a (a > 0). 
We set 


p 


<Z,y> = b eju; 


gd 


for arbitrary z € C? and y € R”, 


8.3.5. Theorem. Let u € M(R?) and a > 0 be such that the func- 
tion y > e<*’¥> is p-integrable for all x’ = (xi, . . . ,2;) € R? such that 


* 'The reader should note that for complex-valued functions, the remainder term 


cannot generally be written in Lagrange form, although it can be written in integral 
form. 
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lci| € e, .. . a Then the function 


olz) = fecu (dy) (8.3.6) 
is holomorphic in the strip S, and is obviously an extension of f. 


Proof. First, the function y — e*<*¥> is y-integrable for every z = 
a + iz’ € S. since |&«*v»| = e-«*»» = e«—"»^ and |zj| « « for all 
j710,..., p. Further, for each such j, the function z  ei«^v» is 
differentiable with respect to z; for each y € R? and we have 


Oa. 
— gi&zuy» = tyje <z>, 
02; 


Every point 2? € S, now possesses a strip Sg with 0 < 8' < a as a neigh- 
borhood in C». We choose ô > 0 such that 8 = 8' + ô< a. 

On the one hand, there exists a positive number M such that |z£| € Me! 
holds for all £ € R. On the other hand, the function 


y — eB vil QU +ly,!) 


is u-integrable. Indeed we have 
2» 
eB Oui +++ +lypl) x ex^, 
t=1 


if bj, . . . , bæ denotes all vectors in R?” each of whose coordinates has 
absolute value 8. However, we have a sum of u-integrable functions on 
the right by hypothesis. For all points z = x + iz’ € Sg and all y € R», 
we then have the inequality 


OC: 
— gi&*uy» 
02; 


ly;le- «v» x Melly! elz lvlt: +: atl lu 


IIA 


M efus, 


By Lemma 8.3.1 and the subsequent remark, e is differentiable with 
respect to z; in Sg, hence in a neighborhood of z°. Since this holds for 
everyj = 1, ..., p, we have, eis holomorphic in S4. 


Remark. 2. According to well-known theorems of function theory, 
e is the only holomorphic function in Se which coincides with g on R?. 


3. The integrability condition given in Theorem 8.3.5 even is necessary 
in order that à may be continued holomorphically onto Se. See Richter 
[20], p. 311 ff. 
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PROBLEMS 


1. Let u be a finite Borel measure on R for which the moment M, and 
hence all moments M; for l = 0, 1, ... , k exist (k € N). Prove 
the existence of a continuous function ©: R — C such that 
(à) | 0(0 =0; 

k 
y k 
(b) pa) = » e Mi; + (x) = for all x € R. 
e I ! 

2. Prove (without and with the aid of Fourier transforms) that no 

moment M; of order k = 1 exists for the Cauchy distribution. 


part IV 


Further Development 
of Probability Theory 


9 


LIMIT DISTRIBUTIONS 


The following considerations on the so-called Central Limit Theorem 
rightly deserve a “central” significance in probability theory. A multitude 
of applications, especiall in mathematical statisties, underscore the 
importance of the limit distribution problems to be treated here, whose 
solution was the chief problem in probability theory for a long time. This 
chapter also provides the reader with the opportunity to realize the fruit- 
fulness of the methods of Fourier analysis. 


9.1 EXAMPLES OF LIMIT THEOREMS 


Suppose we are given a sequence (S,),ew of real random variables on a 
probability space (Q,9(,P). Suppose every variable S, is the sum of an 
independent, finite family (Xni)j=1,...,k, Of real random variables Xn; (on 
the same probability space): 


Sa NC. pees OB dorm CE SE Es vs (9.1.1) 
The following examples show that under simple assumptions on the 


random variables X,;, the distributions (Ps,) converge weakly to certain 
familiar limit distributions. 


Examples 


1. Let (X,).ew be an independent sequence of real, integrable random 
variables. We choose kn = n and Xn; = (1/n)(X; — E(X;)), that is, 


Sn = J CE Ta E(X;)). 
n 
j=1 
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By Theorem 7.7.2, the sequence (Xn) satisfies the weak law of large num- 
bers if and only if the sequence (Ps,) is weakly convergent to the singular 
distribution e, defined by the unit mass at the origin. The condition 


1 
lim — ) V(X) =0 


t=1 


formulated in (6.5.2) for the sequence (X,) thus implies the weak con- 
vergence of (Ps,) to eo. 


2. Let k, = n and suppose each of the (independent) random variables 


X4,] =1,... , n has distribution 61" with 0 € p, x 1. Further, sup: 
pose that the sequence (npn) converges to some a € R,: 
lim npn = a. (9.1.2) 


Then, in the sense of weak convergence, 
lim Ps. = Ta, 


where ra is the Poisson distribution with parameter a (or e; when a = 0). 
This result is due to Poisson. 
For the proof, we note that S, has distribution 87", and thus for each 


f E eR) 
| saps, = Y na - pore. 


Since there are only finitely many integers k = 0 in the support of f, the 
asserted convergence of (Ps,) to Ta is equivalent to the validity of 


: nV , ak 
lim k p,(l— pa)” * = on, (pnmo p ene re) 


But this can be seen directly, since 


k 
k B 1 n Nue B 
@ Pall — pa)” = 7 (1 E > ) JI [n — k + jpa] 


ge 


and since lim np, = o implies lim p, = 0. 
Remark. Vormula (9.1.3) tells us that the probability 


P(S, = kj = G P= paj 


is approximately equal to e~*a*/k! for large n, as long as p, is asympto- 
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tically equal to a/n. This explains why the Poisson distribution is also 
called the distribution of rare events. 


Formula (9.1.3) is used in many practieal applieations where a random 
occurrence can be described by a random variable X, with binomial 
distribution, which can take on each of the values 0, 1, . . . , n for large n 
with constant small probability p but whose expected value E(X) = pn 
is à known number o on the basis of observations. Examples of this kind 
are the description of rare diseases or accidents in a large “population” 
(say the set of subscribers of an insurance company), the description of 
the disintegration of an atomic nucleus, or the number of defectives in the 
daily production of a factory. 


3. Let (X,) be an independent sequence of real, square integrable, 
identically distributed random variables such that V(X,) > 0. Suppose 
we choose each k, — n and let 


EE Cost 12:3) 
(hace Be Xa) 
Then the sequence (Ps,) converges weakly to the normal distribution vi. 
The proof of this is our first example of the usefulness of Fourier analysis. 
We can obviously assume that E(X;) = 0. By hypothesis, the distributions 
Px, and standard deviations e(X,) are independent of n; we set u = Px, 
and o = o(X,). Then S, is written in the form 
a Xi SF PIW = Xn 
c Vn 


Since z  z/a Vn is a linear mapping of R into itself, it follows that 


ll 


EX (3 1, eens 3132); 


Sn 


Oe a( v (s ER) 


is the characteristie funetion of S, according to the familiar rules for 
Fourier transforms. By the Continuity Theorem, we need to show that 
for every z € R, 


lim â ( 2 j gen 

m ĝ - : 

n o c Vn 

Due to the square integrability of the X,, the moment of second order 


Ms; = V(X,) = c? exists for u; from (8.3.4) and (8.3.5), observing that 
M; = E(X,) = 0, we obtain [see Section 8.3, Problem 1] 


2 2 
ala) E pe) 
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where O is a complex function on R such that 
[G(z)) € sup |a"(9z) — à"(0)l, 
MESES 


that is, such that lim O(x) = 0. We have to examine the limit behavior of 


r0 


x Z —2?/2 + Q,x?N" 
S 
o Vn n 
as n — o, where 0, = 1/207 O(x/c Vn). But now lim ©, = 0 and thus 
e ?!? is in fact the limit as n > œ. 


If in particular Px, = 8f is the binomial distribution with 0 < p < 1, 
then the above result is the so-called De Moivre-Laplace Theorem. 


4. Let (X,) be an independent sequence of real random variables each 
with Cauchy distribution yı. Suppose we choose k, = n and let Xn; = 
(1/n) X;, that is, 


1 
Ba = 7 (Xi + si oS Ae 


Then by Section 8.2, Example 3, Xı + : : -> + X, has distribution yn, 
and thus Ps, = yı for all n. Therefore, trivially, the sequence (Ps,) con- 
verges weakly to the Cauchy distribution yı. 


'This example once again emphasizes how essential the integrability of 
the random variables is for validity of the weak (or strong) law of large 
numbers. According to Theorem 7.7.2 we are dealing with weak con- 
vergence of the sequence (Ps) to eo. 

The observation that we can actually obtain every probability measure 
v on R as the weak limit of a sequence (Ps,) is crucial for what follows. 
Suppose we choose a real random variable Y with distribution Py = v 
and set kn = 1, Xn. = Y forallnn = 1,2, .. . . Then the sequence (Ps,) 
is constantly equal to » and therefore trivially converges weakly to v. 

This example shows that a summand Y may dominate the sum Sn. 
But in many theoretical and practical examples, the variables S, are 
composed of summands whose influence becomes arbitrarily small for large 
n. Thus, as we shall show, the consequences for the limits of weakly con- 
vergent sequences (Ps,) which are then possible are the more noteworthy. 


First, we propose the task of defining when there should not be a pre- 
dominant influence of individual summands on the behavior of the sums 
S5. The following definition will soon prove to be useful. 


LIMIT DISTRIBUTIONS 275 


9.1.1. Definition. A family (X,,;)j-1,...,x, of real random variables is 
A EPA S a 


said to be asymptotically negligible if 


limp max Pi =<} = 0, - for alle > 0. (9.1.4) 


n9 1<jSkn 


Thus it is required that P-lim X,; = 0 uniformly in 7. 


"— © 


Examples 


5. If each of the random variables X,; is square integrable with expected 
value E(X,; = 0 and if 


lim max V(X,;) = 0, (9.1.5) 


no 1<jsSkn 


then the family (X,j) is asymptotically negligible. This follows from the 
Chebyshev inequality, according to which 


| = 


P([X,| =e} € = V(X,j) (e > 0). 


[n] 
t2 


6. If the random variables X,; al] have the same distribution y, then the 
family ((1/2).X,;) is asymptotically negligible. We have 


1 
p ||: Ki 
n 


> 1 = P(|X,j 2 ne] = (Ax), 


if we set A, = (x € R: |x| = ne}. Since A, | Ø, lim 4(A,) = 0 and hence 
(9.1.4) follows. . 


7. In all four introductory examples, where we have weak convergence 
tO €o, Ta, vo, Or ¥1, the basic families involved (X,;) are asymptotically 
negligible. In fact, for Example 1, this follows from Example 5 and 
condition (6.5.2). We need to observe only that E(X,; = 0 and 


n 


1 1 
max V(X,) = — max V(Xj) € a> V(X;). 
1SjsSn N” 1€j£n n 
j=l 
In Example 2, X,; has distribution 8f" and lim p, = 0. Thus we have 
to observe only that 


0,. ife] 


JI ES 
PAX ze} b ID e es D 


for all n and j. 
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Example 3 is settled by Example 5 since obviously 


V(X;) 1 


for all n and J. 
Finally, Example 6 shows that the family (X,;) of Example 4 is also 


asymptotically negligible. 


8. The example preceding Definition 9.1.1 provides us with a family 
which is not asymptotically negligible. A family (Xn1)n=1,2,... for which 
k, — 1 for all n is asymptotically negligible if and only if the sequence 
(X41) converges stochastically to zero. 


By Theorem 7.7.2 and the Continuity Theorem, a sequence (X,) of 
real random variables converges stochastically to zero if and only if the 
sequence (¢x,) of characteristic functions converges pointwise to ê = 1. 
The following lemma sharpens part of this statement. 


9.1.2. Lemma. If a family (X,,);-1,...,x, of real random variables is 
asymptotically negligible, then for the associated family (¢x,,) of char- 
acteristic functions, 


lim max |ex,(x) — 1| = 0, for alla E R. (9.1.6) 


no 1SjSkn 


Thus the pointwise convergence of (¢x,,) to 1 as n — © is uniform in J. 


Proof. For all allowable values of n and 7 and for all x € R, we have 


lex) — =| f (e — 1) Px, 9) | 


rs pev fe age A leu — 1| Px, (dy) 
elt] + 2P{|Xnj| 2 e. 


IIA 


This follows from 


IA 


let — 1 =| frenar 


It], for all t € R. (9.1.7) 


Thus we have 


1SjSkn 1SjSkn 


max lex) — 1| s elz| + 2 max P{|X,,| 


IIV 


and hence the assertion. J 
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PROBLEM 


Let (Xn)nen be an independent sequence of identically distributed, real, 
integrable random variables. Use the Fourier transform technique de- 
veloped in Example 3 in order to prove that the sequence (X,) satisfies 
the weak law of large numbers. 


9.2 THE CENTRAL LIMIT THEOREM 


We now turn back to Example 3 of Section 9.1, which is especially 
significant in the question of limit distributions. The first reason for this 
comes from the observation that, under the hypotheses imposed on the 
sequence (X,) in Example 3, the sequence (X,) satisfies the strong law 
of large numbers (by Theorem 6.4.2), and thus as n — ©, (1/n)Z7 X; 
converges P-almost surely to the expected value n = E(X,) which is 
independent of n. The weak convergence of the sequence (Ps,) with 


1 n 
3,0, -» (muse odes t1) 
OS 


to vo1 can then be used to compute approximately the probability of the 
error of the deviation of the random variable (1/n)Z7 ,X; from s. By 
Theorem 7.7.11 we have 


1 : 1 B 
lim P [a s -X 05-9» <e] -— [ema 
n o c Vn & i V 2r a 


uniformly in a and 8. The numbers 


ix 1 [V 
P tys- X;— n< 6 and Gt he 

n 

j=1 


Sn = 


2n. INEA 


thus are arbitrarily close to each other for large n, and indeed uniformly 
in y and ô E R. 

A second reason for the significance of Example 3 is seen from the 
observation that in the domain of applications we often encounter norm- 
ally distributed random variables if they can be interpreted as the sum 
of a large number of independent variables. We may have such a situation, 
for example, with the total error of a measurement. 

Thus, in general, we are faced with the problem of deciding when, for an 
independent sequence (X,) of real random variables, the sequence (Ps,) 
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introduced in Example 3 converges weakly to the standard normal dis- 
tribution vo. In the sense of the following definition, this is the question 
of the validity of the Central Limit Theorem. 


9.2.1. Definition. Let (X,),cw be an independent sequence of real, 
square integrable random variables with variance V(X,) > 0. We say 
that the Central Limit Theorem holds for the sequence if the sequence 
(Ps,) of distributions of the so-called standardized sums 


j as - zoo) 


T 9.2.1 
TERESA Eo 


converges weakly to vo. 


Thus Example 3 tells us that the Central Limit Theorem holds for the 
sequence (Xn) in particular if the sequence is identically distributed. 


For a given independent sequence (X,) of real, square integrable ran- 
dom variables with variances V(X,) > 0, we let 


on = a(Xn), (9.2.2) 
Sp. = o(Xi + 5-4 XQ = (occ do (9.2.3) 
mim x (9.2.4) 


The reasoning of the preceding section makes it clear that a sequence 
(Xn) can satisfy the Central Limit Theorem only when none of the vari- 
ables X, exerts a dominating influence on the distribution of the sum Sy. 
On the basis of the next theorem, we shall see that it is not sufficient to 
require the family of random variables 


1 
Xn; = — (X; — E(X2) S ees fica a E RP A eu (IPS 


to be asymptotically negligible. Rather, the following condition will turn 
out to be crucial: 


lim. Le) = 0, for every e > 0, (9.2.6) 
where 
1 n 
Lale) = 2 » | | (x — »)?Px,(dz). (9.2.6’) 
n z—mj|Zesn 
j=1 


This is known as the Lindeberg condition. We immediately give several 
examples. 
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Examples 


1. Ifthesequence(X,)is identically distributed, that is, u = P x, and then 
7 = 74,0 = c, are independent of n, then the Lindeberg condition is satis- 
fied. We need note only that s? = no? and thus lim s, = +œ and 


PORAN if © Meo) 


g 


holds. 


2. If the sequence (X,) satisfies the so-called Lyapunov condition, that is, 
there exists a 6 > 0 such that 


n 


: 1 
lim ars), E(X, = i? = 0 (9.2.7) 


no “n 
gu 


then the Lindeberg condition is satisfied. Indeed, for every «e > 0, 
lr — n| = es, implies the inequality |x — 5|?*? = |x — «|?(es,)*. Hence, 


1 i oe 
In $n, [le wi Px(de) S sans Y FX; — nint 
n ed r—nij|zesSn n ' 


j= 
for each e > 0. 


3. If the sequence (X,) is uniformly bounded and lim s, = + œ, then the 
Lyapunov condition is satisfied for every 6 > 0 and thus the Lindeberg 
condition is satisfied. Indeed, by hypothesis, there is an o > 0 with |X,| € 
a/2, that is with |ņm| € E(|X,]) € «/2. Hence |X, — m| S o for all 
n = 1, 2, . . . . But then for all 6 > 0 the Lyapunov condition follows 
from the inequality 


1 : o? : a\? 
RIT BX; — nies km X E(\X; — n|) = (<) 
n fel n gum 
The connection between the Lindeberg condition and the asymptotic 
negligibility of the family (X,;) defined in (9.2.5) is the following: 


9.2.2. Lemma. If the sequence (X,).ew satisfies the Lindeberg 
condition, then it also satisfies the following condition, named after 
W. Feller: 

lim max (2) = 0. (9.2.8) 


n—- 1<j<n \Sn 


From the Feller condition it follows that the family (Xn;)j=1,...,n defined 
1:1,25... 


by (9.2.5) is asymptotically negligible. 
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Proof» "Bor every I = d 2-5 05 


(x — n)’Px, (dz) 


o? = f (æ — n)*Px,(dx) < est + 


Iz —nj|Zesn 


IIA 


D in As 

ias ü 22 iae (x nj) Px,(da) 
j= 

and hence 


T; 2 
A (2) EG 


for arbitrary e > 0. Thus the Lindeberg condition implies the Feller 
condition. Now we have (c;/s,)? = V (Xn), that is, 


lim max V(X,j) = 0 


nw 1SjSn 


by the Feller condition. The rest of the assertion now follows from Sec- 
tion 9.1, Example 5. 


Remark. 1. 'The Feller condition is equivalent to the simultaneous 
validity of 


lines o, (9.2.9) 
lim = = (), (9.2.10) 


as we can verify immediately. 


b 


Now we are in a position to formulate and prove the main result. 


9.2.3. Theorem (Lindeberg-Feller). For every independent se- 
quence (Xn)nen of real, square integrable random variables with variances 
V(X,) > 0, the following statements are equivalent: 


(a) The Central Limit Theorem holds and the sequence satisfies the 

Feller condition. 

(b) The Central Limit Theorem holds and the family OGL) maa a 
t 2 ele a 

asymptotically negligible. 


(c) The sequence satisfies the Lindeberg condition. 


Proof. By Lemma 9.2.2 we need prove only:the implications (b) = (c) 
and (c) — (a), and of the latter, only the part concerning the validity of 
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the Central Limit Theorem. We can assume 5, = E(X,) = 0 for all n for 


both proofs. It suffices to center every variable on its expected value. If 
we now set 


0; E 
hnj = Pxys, and E. (J = ie aux aene d. a iE 2, s 2v 
z j 


then un; € M(R) has 0 and 77; as its first and second moments respectively 
and 27,7, = 1. The Lindeberg term L,(e) can then be written as 


Le). 3 ha Tun dr): 


1. First we prove the remaining part of the implication (c) — (a). By 
(8.3.4) and (8.3.5) we have 


m? ma 
=jo@= pan 
A(z) 1 c 2 + Ox) 9 


for the Fourier transform of any measure u € 9t! (R) with f xu(dx) = 0 
and c? = {x%u(dx) < +, where the function ©: R — C satisfies the 
condition 


le(s DEI lg" (8) — à" (0| — (x ER). 
Moreover by Theorem 8.3.3 


B" (z)--— fi evy?u(dy), 
that is, 
le) s sup f let» — 1b'utp). 
E Osvs1 


If we note that the convergence of e" to 1 as y > 0 is uniform on every 
compact t-interval, then for given x E R, for every e > 0 there exists a 
ô > Osuch that |e?» — 1| < eforally € R with |y| < $andall è € [0,1]. 
Therefore, for fixed z € R, we have the inequalities 


[6 ()] S up ji |etozu ias lly ?*u(dy) + ee NPE ez — l1|y?u (dy) 
Se* +2 jee ?u (dy). 


We apply this result to both un; and the normal distribution vo,*;. On the 
one hand, we obtain 


f(x) = 1 — e Onile) = 


Tw 2 (9.2.11) 
G.(z)| S err; + 2 yun (dy) 


[ylz à 
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and, on the other hand, 
borz; (x) = enill?) =] — fe m @* (2) © ES 
2 (9.2.12) 
ORE) s et; +2 f a, roti). 
Here 6 depends on e and v in the described manner. 


The rest of this part of the proof is based on a simple majorization, 
which is derived from the inequality 
k 
s ) la. — bal 


lA 


Heese 


x=1 x=1 x=1 


which is valid for complex numbers a1, . . . , aj bi, . . . , b, of absolute 
value € 1.! Using the independence of the sequence (X,), we obtain 


n n 
les, (x) — e? = | ll fij) — lI erui 


j=l j=l 
D In. (2) — et )] = >) le.) — O$), 
j=l j=1 

for the characteristic function es, of S, = 27, (X;/sn), and hence, by 


(9.2.11) and (9.2.12), the majorization 
les) — ^| s x? [e + 2 f i Pes) 


Sus Y ies yvon, (dy) |, (9.2.13) 
j=1 


which holds for all n = 1, 2, . . . . If we now set a, = max Tn; for the 
1SjSn 
number involved in the Feller condition, then 


i n 
2 2 , = 2 
2 ine y rri ly) 2 i Pina y’vo (dy) 


= Ji 2 =f 7 
b 2 T^i Jizza Y 309). = Jia, Y voldu) 


and hence 


n 


lim » diets vor2,(dy) = 0, 
n9 jc 


1 This follows from the triangle inequality applied td the identity Ma, — IIb, = 
(a1 — b1)-a2:... ~dy + 0i(@2 — b2)sda* . 2. ay -b sm bye oe. - by a (ax — by). 
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since lim a, = 0 by Lemma 9.2.2. From this, the Lindeberg condition, 
and (9.2.13), it now follows that 


lim es, (x) = joi(x) CER) 


n— ov 


and thus, by the Continuity Theorem, we have the weak convergence of 
(ES) to V 9,1. 


2. Finally, we prove (b) = (c). Since the family of random variables 
Xnj = (1/sn)X; is asymptotically negligible and Px,, = un; then by 
Lemma 9.1.2 for every fixed x € R: 


lim max |&,(x) — 1| = 0. (9.2.14) 


no l1€Xjt£n 


Hence 1t follows that 


max |là4(x) —1| € à for n sufficiently large. (9.2.15) 


1SjSn 


For every complex number z with |z| € 3, 
— |1 y—1 
log (1 +2) =z + dolos 
v 
v=2 


is the Taylor expansion of the principal branch of the logarithm and 


ya 1 1 ? 
(Ir «| tope la = = lel le|?. 
y 2 21 -— izl 
y»22 . v=2 


Then by (9.2.15), for all sufficiently large » and all 7 
log finj(x) = f(x) — 1 + Rz), (9.2.16) 


where |R,;(x)| € |fn;(x) — 1|?. If, moreover, we take into account the 
inequalities? 


IIA 


I 


25,0, 


là; (x) T 1| = | Í (err éxy)ue i) 
x? 
< fie — 1 — éylus(dy) S > =f YunildyY) = Thi > a 
2 For all ¢ € R, |e — 1 — it| < t?/2. In the proof we can obviously assume that 
t 
t 2 0. Then we conclude as follows: |e* — 1| = LH e'dr| St; jet —1 —it| = 


| [ie ned oet 
f, e-»e|sfre-5 
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which follow from 0 = E(X») = n then we obtain that |,;(x)| < 
làs;(z) — 1[r2;(z?/2) and hence, since Zra; = 1, we have 


| 5 Rast) 


for sufficiently large n, that is, by (9.2.14), 


p uS x |ne) ci 


lim » [XC (9.2.17) 


no 


j=l 


Moreover, since we assume that the Central Limit Theorem holds, we 
know that 


es,(x) = lI rE) 
j=1 


converges to ?oi(r) = e-*/? as n — œ. Therefore from (9.2.16) and 
(9.2.17) we obtain 


n 


D (éni(x) —1) 
lim e^! mios 
n © 
and hence 
n n 
p» QR (finj(x) — 1) Z (Anj(x) — 1) 
lim e^! = lim |e/*! = eral 
"n o n— © 
that is, finally, S 
n 
. t 
lim O ml) ee ene (9.2.18)? 
n— œ 2 
at 


Simple reasoning now completes the proof: For arbitrary x Æ 0, the 
sequence 


n 


Tests, J R(An(e) — 1) == — = > / (1 — cos zy)un;(dy) 


approaches zero as n — oo, by (9.2.18). For arbitrary given e > 0, con- 


3 The reader should note that the equality log es, (x) = È log ân; = È (ân; — 1) + 
Z Rn; holds only modulo 2x? and therefore the sign of the real part cannot simply be 
omitted. 
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sider the expression 


d 1 
Th = = (1 — cos xy)un;(dy). 
FE lylze 


Using the Chebyshev inequality (6.3.11) we can majorize this as follows: 


n n 


2 2 
Me ce Il de = 5) PX 
2 Iy|ze TUA 


Sn 
j=l j= 


IIV 
e^ 
— — 
IIA 
e^ 
% | iS 
t2 
up 
*3 
S to 
M. 
Il 
g| 
t2 
to 


This leads to the following bound for L,,(e): 


1 T Ture vy? 
2 (e) De f y) 9 SAM 2 usj(dy) 
j21 I 
1 1 . 3 
$52 5 [1 — e + éixy|us;(dy) 
x We 


1 1 - i 
LEL = » ji (1 — e* + ixy)us;(dy) 
207 £4 | J lvi<e 


mE E 
i722] (1 — eos zy)unj(dy) = T. + T, S Ta + = 
2n z f Siul<e e 


IIA 


(for every x z& 0 and e > 0). The Lindeberg condition now follows, since 


; 2 
hm-T7T, — 0 for all z:zé O and lim = = 0 for each « > 0. 


n o z—-4-- € 


Thus the theorem is completely proved. _ 


Remark. 2. Because of Example 1, the result of Section 9.1, Example 
3 and also the De Moivre-Laplace Theorem, are contained in the Lindeberg- 
Feller Theorem as special cases. 


3. In the Lindeberg-Feller Theorem we cannot do without the Feller 
condition or the asymptotic negligibility of the family (X,;) in showing 
(a) — (e) or (b) = (c), respectively. This can be seen from the following. 


Example 


4. Let (cn) be a sequence of positive numbers with Z7 ,c; < © and let 
(X,) be an independent sequence (which exists by Corollary 1.5.4) of real 
random variables with Px, = vo} (n = 1, 2, . . .). Then the Central 
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Limit Theorem holds since each of the standardized sums S, has distribu- 
tion vo. On the other hand, the Feller condition and therefore the Linde- 
berg condition are violated since s? = o? + : : : -Fo;,n—1,2,...is 
bounded and thus (9.2.9) 1s not satisfied. 


PROBLEMS 


1. Let (X,) be an independent sequence of real random variables with 
v1 as common distribution. Prove that the squared sequence (X?) 
satisfies the Central Limit Theorem. Can vo: be replaced by other 
distributions in order to have the same conclusion? 

2. Let (Xn) be an independent sequence of real, square-integrable ran- 
dom variables. Prove: lim s, = + œ if and only if Zz ,V(X,) = + 


[where sn = o(Xı + : - - + X,)]. 

3. Let (X,) be an independent sequence of real random variables such 
that each X, only attains the values 0 and 1 with positive probability. 
Prove that (X,) satisfies the Central Limit Theorem if the series 


oo 


2, PUX. = 0) - PX, = 1] 
n=1 
diverges. 

4. Let ^ be a real number, and let (Xn)ney be a corresponding indepen- 
dent sequence of real random variables with the properties stated in 
part (b) of the Problem of Section 5.4. Prove: 

(a) For ^ < —} the sequence (X,) does not satisfy the Feller 
condition. " 
(b For \ z —4ione has 


(c) Conclude that (X,) satisfies the Central Limit Theorem for all 
MEE 


5. Assume that the Central Limit Theorem holds for an independent 


sequence (X,) of real, square-integrable random variables with 
variances V(X,) > 0. Prove that 


il : 1 e(n/sn) 
P | Y: > (X; — E(X,)) | < 1 = a e- G2 da 
í i=1 V 2r —e(n/sn) 


converges to zero as n — oo uniformly for all e > 0. Conclude: The 
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sequence (Xn) does not satisfy the weak law of large numbers if the 
sequence (n/sn) is bounded. 

6. Apply the result of Problem 5 to the sequence (X,) of Problem 4 and 
prove that (X,) does not satisfy the weak law of large numbers if 
^ 2 $. This completes the discussion of Section 6.4, Problem 1 and 
Section 6.5, Problem 2. 


9.3 INFINITELY DIVISIBLE DISTRIBUTIONS 


In the first four examples of Section 9.1, the probability measures eo, Ta, 
vo1, and yı on R were obtained as limit distributions of sums 
S, = Xn + ccc Xn, of the asymptotically negligible families (X,;) 
considered there. By Section 5.3, Remark 4, these probability measures 
are all infinitely divisible. We now show that this is no coincidence. But 
in order to do this we first establish some properties of such probability 
measures on R (or, more generally, on R?, p = 1, 2, .. .). 


9.3.1. Definition. A measure u € 9I! (R») is said to be infinitely 
divisible if for every natural number n there exists a measure un € 9I (R») 
whose n-fold convolution product with itself is equal to u: 


fT mens uoc 0 9 OL (9.3.1) 
By the Uniqueness Theorem for Fourier transforms this is equivalent to 
fi = (fa)”. (9.3.2) 
Example 
1l. If wp, ...,u™® are finitely many infinitely divisible measures on 
R?, then v = ® Q --- Q w™ is an infinitely divisible measure on Rr. 
In fact, for every n E N andi =1,..., q there are measures poe’ 
St'(R») whose n-fold convolution product yu? * >- - *u is equal to 
u. But then 
m real TY RONDE TRA 


is a measure from 9I (R»?) with v as its n-fold convolution product.* 


4Tf wi, 2.2. , us 8nd yi, ... , » are finite Borel measures on R?, then 
(19 ++ Gu)* 1890 +--+ Qrun) = Qi») 8 +++ @ (un* Mn). 
This follows at once by Fourier transforms. By Theorems 8.1.5 and 8.1.6, 


(x1, * 4 Zn) > fi(21) ley ds sie * fin (Xn) : f(x) Sm Tere - (Tn) 
= füi(xi)fi(ri)- ... * An(2n)?nlTn) 


is the Fourier transform of both measures involved in the equality to be proved. 
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9.3.2. Theorem. If u is an infinitely divisible measure on R?, then 
p(x) z 0, for all z € R». (9.3.3) 


There is exactly one continuous mapping v: R? — R such that (0) = 0 
and 
& = ple’. (9.3.4) 


Proof. By (9.3.2) we have lan = V/lal for all n € N and hence 
lim |à,(z)| = 1 or 0 depending on whether f(x) Æ 0 or a(x) = 0. Since 


n— 0 


&(0) = 1, we have A(x) ¥ 0 and thus lim |f,(x)| = 1 in a neighborhood 


of zero; hence lim |à,| = lim |f,|? is continuous at z = 0. By Theorem 
8.1.5 |an|? is the Fourier transform of the measure un * S(un), where S 
denotes the reflection x — —z. By the Continuity Theorem, lim |f,|? 
must itself be the Fourier transform of a measure from Me(R?), and 
in particular must be continuous. Since lim |g,| = lim |f,|? assumes only 
the values 1 and 0 and the former value is taken on at x = 0, it follows 
that lim |@,(z)| = 1 for all x € R». As a consequence of the introductory 
statements, we then have fi(x) ¥ 0 for all x € R». The rest of the asser- 
tion follows from Theorem A.2.5 J 


9.3.3. Corollary 1. For every infinitely divisible measure u on R?, 
each of the measures un € 9I (R7) is uniquely determined by the equality 
(9.3.1). With the notation introduced in Theorem 9.3.2, we have 


Bn = V lal eel (sm Oed c soy (9.3.5) 


for its Fourier transform. 


Proof. By Corollary A.2, for every n = 1, 2, . . . , there is exactly 
one continuous mapping y: R? — C such that (0) = 1 and y^ = f. 
This is given by 


y = Val exer. 


Since f£, is continuous and £,(0) = 1, we see that V = f, by (9.3.2). By 
the Uniqueness Theorem 8.2.4, un is uniquely determined by y and thus 
also by yp. J 


9.3.4. Corollary 2. The sequence of measures un C M!(R?) asso- 
ciated with an infinitely divisible measure u on R? by (9.3.1) is weakly 
convergent to eo. 


a 


5 This is the second proposition in the Appendix. 
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Proof. From (9.3.3) and (9.3.5) it follows that lim &,(x) = 1 for all 


n— o 
x ER». Since ê = 1, the assertion now follows from the Uniqueness 
Theorem. J 


Remarks. 1. Formula (9.3.5) does not say that f,(z) is the principal 
value of the nth root of f(x). To see this, it suffices to consider the measure 
m =e on R. Then un = ey, f(x) = e, and f(x) = e'*/?», Hence 
An(2r) = e?r/? which for n = 2, 3, . . . is not the principal value of the 
nth root of ü(2r) = 1. 


2. Theorem 9.3.2 often makes it possible to verify that a measure u € 
9I! (R») is not infinitely divisible. Examples of this are the so-called 
rectangular (or uniform) distribution on R 


p = $lpaauM 
with Fourier transform 


(x ¥ 0) 


and the binomial distribution 8? for p = à with Fourier transform x — 
2-"(1 + e)”. Both Fourier transforms vanish at x = r. 


3. For 0 <p <1 and p z 4, the Fourier transform of the binomial 
distribution 87 has no real zeros. Nonetheless, 87 is not infinitely divisible. 
We can represent 67 only in trivial way in the form 8? = u*v with 
measures u, v € 9I! (R).* Here trivial means a representation in which 
either u or v is a unit mass e,. It is easy to see that measures u and v € 
MI(R) with u * v = 8? must be discrete and that the numbers s and t of 
points having positive u- and »-measures, respectively, must satisfy 
s +t— 1 <2; that is, s = 1 or t = 1. By considering such a decom- 
position into n + 1 factors instead of 2, one can prove that for p € ]0,1[, 
B? is not infinitely divisible. 


Corollary 1 above motivates us to exhibit a further aspect of the con- 
cept of infinitely divisible measures. To this end, we give: 


9.3.5. Definition. Let (u);cg, be a family of probability measures 
uı € 9STC(R?) on R? indexed by the elements of R+. Then if 


Mstit = Ms * Mt, for all S, t c R,, (9.3.6) 


we call (u:):cr, a convolution semigroup of probability measures on R”. It 
is called continuous if the mapping t — m of R, into 3i! (R7) is vaguely, 
and hence also weakly continuous. 


6 Therefore 6? is a so-called indecomposable probability measure. 
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The following theorem now shows the connection with the previous 
investigation: 


9.3.6. Theorem. Each of the measures u, of a convolution semigroup 
(uen, of probability measures on R” is infinitely divisible. Conversely, 
if u is an infinitely divisible measure on R? and f, > 0 is a positive real 
number, then there is exactly one continuous convolution semigroup 
(ut) zn, Of probability measures on R? such that u;, = p. 


Proof. Let (u)igm, be a convolution semigroup. Then by (9.3.6), for 
arbitrary t € R} and n EN, 


Mt = Hint etn ~ Hin 2 E Herm 


(with n summands or factors). Therefore every measure m is infinitely 
divisible. For the proof of the second assertion, let u and tọ be given as 
above. For u, by Theorem 9.3.2 there is exactly one continuous mapping 
e: R? >R such that (0) = 0 and & = lâle. By Corollary 9.3.3, for 
every n € N there is exactly one measure uj, € N(R?) with Fourier 
transform 

hin = [pl rete!” , 


Therefore, for every rational number r = m/n 2 0 with n CN and 
m ENU {0}, u, = uya* ` * > * pı (m factors) is the only probability 
measure on R? such that 
A, = lârer. 

For every t € R, there is a sequence (r,) of rational numbers = 0 con- 
verging to t. The sequence of functions |ü|'«e?»^*, n = 1,2, . . . , converges 
pointwise to the continuous function |[|'e**. By the Continuity Theorem 
8.2.7 there is thus exactly one measure u, € 9I (R») such that â, = ||'e'e. 
But then fü, = f, that is, us * we = wet for all s, t € R4, and thus 
(u))ien, iS a convolution semigroup with ui = u. Consequently, (uui)icn. 
is a convolution semigroup which associates the measure u with the number 
to. Since â, = |al‘e"? it follows from the Continuity Theorem 8.2.7 that this 
convolution semigroup is continuous. To prove uniqueness, let (w:)ıer, be 
a continuous convolution semigroup of probability measures where u, = p. 
Then, since ur, = uus * * * ` * uus (n factors), 


Aun = |A| rete! 
for all n € N. As above, it follows from the continuity of the semigroup that 
But, = |plfete or f E lg |! teitte 


for all t € R,. By the Uniqueness Theorem for Fourier transforms, every 
m is therefore uniquely determined by u. J 
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9.3.7. Corollary. For every convolution semigroup (ui) ien, of proba- 
bility measures on R?, uo = €o. 


Proof. From 1 = ui * uoit follows that 1 = fifo. Since uiis infinitely 
divisible, ĝı(x) Æ 0 for all x € R? by Theorem 9.3.2. Hence we obtain 
ĝo = 1, and thus, by the Uniqueness Theorem, uo = eo J 


Examples 
2. A particular convolution semigroup (u),cr, of probability measures 


on R? is obtained by defining 


T 7 
V9, Q aree Q V0,t; [E 0. (9.3.7) 


p factors 


For s > t, t > 0, we have (see p. 287, footnote 4) 


Ms * Me = (Vos * vot) @ c c c Q (Vos * voi) 
UAE MEC thn) VoO,st+t — Ms4+t. 


We call (u:)ier, the Brownian convolution semigroup on R?. 


3. The Poisson convolution semigroup on R is the family (r:)ıer, in which 
To = éo and m: is the Poisson distribution with parameter t for t > 0. 
From Section 5.3, Example 2 we see that this is a convolution semigroup. 


4. The Cauchy convolution semigroup on R is the family (y:):er, in which 
Yo = eo and y; is the Cauchy distribution with parameter t for t > 0. See 
Section 8.2, Example 4. 


Now we derive further properties of infinitely divisible measures which 
will be needed later. 


9.3.8. Lemma. When y and v are infinitely divisible, u * v and T(u) 
are also infinitely divisible measures on R?. Here we let T be an arbitrary 
linear mapping of the vector space R? into itself. 


Proof. Forevery natural number n there are measures un, v, € 9I! (R7) 
with u, * °° " *u, — p and va* °° * *v, = v. But then pn * v, 18 a 
probability measure and (un * v.) * °° > * (un * yn) = u * v, and thus u *v 
is infinitely divisible. The infinite divisibility of T'(u) is obtained from the 
following more general assertion: If rı, . . . , Tn are measures in 9I*(R7), 
p = 1, then 

Dir ete us giu) etm (ra) t Tr. ): (9.3.8) 


? For t > 0, mis a particular *p-dimensional normal distribution." 
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By the Transformation Theorem 2.10.1, for every 8?-measurable function 
f = 0 on R? we have 


(Ral ecu ern e | PST Gir Lu om OS. 
= fyon hert +++ + tarildi) aldra) 
= ITE) co TGS))n(da)-* ree) 
= ffi te: > candy) 7 r4 (dys) 


= ffafr» > 8n) 


where we let 7; = T(r), j = 1, . . . , n. If in particular we choose the 
indicator function of an arbitrary set from $8? for f, then the assertion 
follows. J 


The following theorem shows that the set of infinitely divisible measures 
on R? is closed in 9! (R7) relative to the vague topology. 


9.3.9. Theorem. Suppose a sequence (v;).ew of infinitely divisible 
measures on R? converges vaguely to a measure u € 3I! (R»). Then u is 
also infinitely divisible. 


Proof. We first show that £ is nowhere zero on R”. Since all v; and y 
are probability measures, the sequence (v;) converges weakly to u and thus 
by the Continuity Theorem lim ?,(x) = A(x) and also 

k— o 


lim V |#%(x) |? = v/la(a)|? (9.3.9) 
k o 
for all z € R^ and n = 1, 2, . . . . Here |5,|? and |a|? are the Fourier 


transforms of the measures v, * S(v,) and u * S(u) respectively where S 
again denotes the reflection x > —x. By Lemma 9.3.8, each of the meas- 
ures v, * S(v,) is infinitely divisible, that is, by Corollary 9.3.3, V lôr]? is 
the Fourier transform of the measure px,» € IN'(R”) whose n-fold convolu- 
tion product with itself is equal to v, * S(v,). Then, from (9.3.9) and the 
second part of the Continuity Theorem, we can conclude that V la? is 
the Fourier transform of a measure u, € 3 (R»)(n = 1, 2, . . .). Hence 
(8,)" = là|?, that is, u * S(u) is infinitely divisible and thus |g(x)|? = 0 
and a(x) ¥ 0 for all x € R”. 

Now, by Corollary A.2, there is exactly one continuous mapping e: 
R? — R such that ¢(0) = 0 and @ = |&|e'*. Also, by Theorem 9.3.2, for 
each k = 1,2, . . . there is exactly one continuous mapping e;: R? —^R 
such that (0) = 0 and », = |?,]e'&. We observed already that for every 
x € R?, the sequence (?,(x)) converges to f(x). By the Continuity 
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Theorem, the convergence is uniform on every compact subset of R?. But 
then (|%,|/@) converges uniformly to 1 on every compact set, and we 
conclude that the convergence 


lim eiCex-9) = il 


k wo 


is uniform on every compact set C R?. Now Corollary A.5 tells us that 
(ex — p) approaches zero uniformly on every compact set C R». For the 
measures v,; C JU (R») satisfying ve = via * * ` * * vis (n factors) it 
thus follows that 


lim ?4,(z) = lim V |z,(z)| ee» = Vala) eI 

ko œ ko 
holds for all x € R?, by using (9.3.5). By the Continuity Theorem, the 
continuous function V |a| e*/? is the Fourier transform of a measure 


By &— SUR». Then (f,)” = pf; that is, un * «°° kün = p. Therefore x 
is infinitely divisible. J 


9.3.10. Corollary. For every measure u € 9 (R7), the function 


as 


is the Fourier transform of a (uniquely determined) infinitely divisible 
measure. 


Proof. The assertion is true if u is a discrete probability measure, 
that is, if u is of the form 


t= » Nes; (s; E R?, i; > 0, > Aj = n 


j=1 j=l 


Then 
a(z) = ) Nee 
and hence 
e@—-1 = ll Qut ioa for all x € R». (9.3.10) 
j=1 


Now for every ^ > 0 and y € R7, we know from Section 8.1, Example 2 


that z — e ^7" 7—» is the Fourier transform of the probability measure 


AF 
XD] ER » e^ k! €ky- 
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This is infinitely divisible, as can be seen immediately from the form of 
the Fourier transforms or from its relationship to the Poisson distribution 
m. Therefore formula (9.3.10) tells us that e?-! is the Fourier transform 
of a convolution product of finitely many infinitely divisible measures. 
Hence the assertion follows from Lemma 9.3.8 in this case. 

The general case is disposed of via Corollary 7.7.4 and Theorem 7.8.4. 
According to the above, for every u € 9I (R7) there exists a vaguely and 
thus weakly convergent sequence (u) of discrete probability measures 
with limit u. By the first part of the Continuity Theorem, (f+) then con- 
verges pointwise to, g, and thus (e—1) converges pointwise to the con- 
tinuous function e?-!. The assertion now follows from the second part of 
the Continuity Theorem and Theorem 9.3.9. J 


The preceding theorems, and in particular the last corollary, make it 
possible for us to construct new examples of infinitely divisible measures 
on R?. 


With these tools, we return to the question raised at the beginning of 
this section, namely, the question of the role of infinitely divisible mea- 
sures among the limit distributions of sums S, = Xm + -© + Xu, 
of asymptotically negligible families (X,;) where for each n, the random 
variables (X,;) are independent. 

It is easy to see that every infinitely divisible measure u € IN!(R) occurs 
as a vague limit of the distributions Ps, of such sums Sn. For this we need 
only give a probabilistic interpretation to the definition of infinitely divisi- 
ble measures: Foru and n = 1,2, . . . , consider the measure un € MI(R) 
uniquely determined from (9.3. 3) and tenporaniy Set u,; = un for every 
j=1,...,n. Then by Corollary 5.4.5 there i^an independent family 
(Xnj)j=1,2,....» Of real random variables on a suitable probability space? 
(Q,9(,P) such that Px,, = un; for all allowable pairs of indices n, j. 

(hen, un particular, for eachi n= P2. o =.) X uas to e TOU tan Ee 
independent and thus Ps, = Px,*-- : *Px,, —ys* * c c *pgs =H, 
that is, the sequence (Ps,) is constant, equal to u. The family (X,j) is 
asymptotically negligible since Px,, = un is independent of j and since, by 
Corollary 9.3.4, the sequence (un) converges vaguely to eo, that is, by 
Theorem 7.7.2, the sequence (Xni)n=1,2,... converges stochastically to 
ONG als 2, n). Since P{|X,,;| 2 e] = wae eR: |z| & e}), the 
convergence is SS in J. 


8 For example, Q (Raj Dnjunj), where each Ra; = R and each Ba; = 38!, is 
jz1 


such a probability space. ` 
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Conversely we show: 


9.3.11. Theorem. Let (X,,)j-1,...,., be an asymptotically negligible 


n=l,2, 
family of square integrable, real random variables which satisfies the 
following conditions: 


AT A a are Independent for every n = 1,2,...; (9.3.11) 
E(X,j) = 0 eu malc qe 2 syn Bs pa £29) 
sup V(Xn + ce aa 4+ X nk) < +o. (9.3.13) 
7:51,2,... 
If the sequence (Ps,) of distributions of the sums S, = Xn + : + > 4 Xa, 


converges vaguely to a measure u © 9I (R), then u is infinitely divisible. 


Proof. We proceed similarly to part 2 of the proof of Theorem 9.2.3. 
By Lemma 9.1.2, for every x € R we have 


max |fn;(z) — 1| € 4, for n sufficiently large. 
1SjSkn 


Hence, as before (using the principal branch of the logarithm), it follows 
that 


log ü.(x) = finj(x) — 1 + Rx) 


with the majorization 


[Rns(2)| € |x) — 1]? < 5 V(X) Bsa) — 1| 


x T = V (Xn) max [62 pe 1| 

1sjskn 
of the remainder term. If we note that V (Sn) = V(Xn1) + : > > + V(Xnz,) 
by the Bienaymé equality and that by hypothesis the sequence (V(S,)) is 
bounded from above by a number a > 0, we have 


kn 
| > Rx) oe ES = V(S. ) max |às(x) — 1| 
T= 1<jSkn 
g? 
X a — max |à.;(x) — 1l. 


1SjSkn 


Another application of Lemma 9.1.2 now yields 


lim 5 R.(x) = 0, 


n> © 
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which as before implies 
kn 


© i1) 
lim e?^! = ie (9.3.14) 
But now, by Corollary 9.3.10, 
kn 
È (.i-12 kn 
el^! == efni! 


is the Fourier transform of a convolution product of kn infinitely divisible 
measures and thus, by Lemma 9.3.8, it is the Fourier transform of an 
infinitely divisible measure v, (n = 1, 2,. .-)- By the Continuity 
Theorem, (9.3.14) tells us that the sequence (v,) converges vaguely to u. 
But then y is infinitely divisible by Theorem 9.3.9. J 


Remarks. 4. Theorem 9.3.11 remains valid if we drop the hypothesis 
(9.3.13) and assume only that the random variables X,; are integrable. 
The proof is then carried out as above, but a finer majorization is required. 
We omit the details and refer to Gnedenko-Kolmogorov [33]. 


5. The hypotheses of Theorem 9.3.11 are satisfied, in particular, for the 
families (X,;) considered in Examples 1 to 3 of Section 9.1: In Example 1, 


n 


V(S,) = s V(X;), Hw npo em 
n a 
converges to zero and thus is bounded. In Example 2, V(S,) = np,(1l — pn) 
converges to a and thus again is bounded. Finally, in Example 3, V(S,) = 1 
for all n. On the other hand, the random variables of Example 4 are not 
even integrable (see Section 4.4, Example 7). 


6. Corollary 9.3.10 is closely related to the so-called Lévy-Khinchin 
formula (see [33]), whereby a measure v € 9I (R) is infinitely divisible if 
and only if there exist a (necessarily uniquely determined) measure 
u € WM(R) and a (also uniquely determined) y € R such that 


p(x) = etvteM@, for all x € R, (9.3.15) 
where 
uy C" bey 
M(z) = iw — | — y). Sr 
(x) i (« aem =) 7: u (dy) (9.3.16) 


The integrand in (9.3.16) has the value —2?/2 at y = 0 since one requires 
continuity of the integrand.? 


? A proof of the Lévy-Khinchin formula based on the “representation theorem of — 
G. Choquet” can be found in Johansen [37]. 
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PROBLEMS 


l. Let u be an infinitely divisible probability measure on R?” and T: 
R? > R? a linear mapping of the vector space R?into the vector space 
R‘. Prove that T'(u) is an infinitely divisible probability measure on R«. 

2. Decide whether the so-called triangular distribution on R, that is, 
the probability measure £A! with density 


t(x) = 0, lc 
is infinitely divisible. 

3. Prove that the Gamma distribution f,\! with parameter t > 0, defined 
in Section 8.2, Problem 3, is an infinitely divisible probability measure 
on R. For given to 0, determine the continuous convolution 
semigroup (u:):er, of probability measures on R such that uj, = fj. 

4. Investigate whether the convolution semigroups of Examples 2 to 4 
are continuous. 

5. Let (u:)cer, be a convolution semigroup. Prove (w) is continuous if 


and only if lim m = eo holds in the vague topology. 
t>0 


9.4 CHARACTERIZATION OF THE NORMAL DISTRIBUTION 
(STABLE DISTRIBUTIONS) 


Because of the Central Limit Theorem, the normal distribution 
assumes a particularly important role among the infinitely divisible 
measures on R. It is therefore natural to ask for simple characterizations 
of the normal distribution in the set of all probability measures on R. 
One answer to this question will be given in this section. 

To this end, we consider the group 3 of all mappings of R into itself of 
the form 


x—-re +8 (r» 0,8 ER). (9.4.1) 


For every normal distribution va, with parameters a and o°, Var+6,(or)2 18 
then obviously the image of v,,,; under the mapping (9.4.1). Therefore 
the set 3(»4,;) of all images T'(ve,¢2) of va, under the mapping T € 3 is 
the set of all normal distributions and is thus stable relative to convolution, 
that is, for every two measures vı, v2 € 3(va,o2), v1 * vs also lies in this set 
(see Section 5.3, Example 3). Therefore every normal distribution is stable 
in the sense of the following definition: 
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9.4.1. Definition. A measure u € 9I (R) is said to be stable (or a 
stable distribution) if the set 3(u) of all images T'(u) with T € 3 contains 
the convolution product of any two of its elements. 


Since 3 is a group of mappings, then of course, when u is stable every 
other element from 3(u) is also a stable probability measure on R. 


Examples 


1. Every unit mass ca, a € R, is stable, since 3(ea) is the set of all measures 
e, With z € R. 


2. The Cauchy distribution yı (and hence every Cauchy distribution 
Ya, a > 0) is stable. For we have that 


Ird) = (v * e: 7 > 0, 8C RJ 


is stable with respect to convolution (compare Section 8.2, Example 3). 


3. The Poisson distribution mı is not stable. For the mapping x — 
T(x) = 4x we have [see (9.3.8)] 


oo 


2n 
T (m1) * T (1) = T (m1 * T1) = T (m3) = M iier €n/2. 


n=0 


On the other hand, the image r’ of tı under the mapping x — rx + 8 from 
3 equals: 


oo 


1 
C= —1 
T = COME ern he 
» pic 


n=0 


Therefore T(m1) * T(ri) Æ 3(m). 


The last example shows that not every infinitely divisible measure on R 
is stable. Conversely, however, we have: 


9.4.2. Theorem. Every stable measure u € MI(R) is infinitely 
divisible. 


Proof. For every n = 1, 2, . . . , the n-fold convolution product 
m* ccc xu of y with itself lies in 3(u). Hence there is a mapping xz — 
T.(r) = 1x from 3 and a number 6 E R such that u*:-:-: xu = 


T.(u) * eg. By (9.3.8), since Tjj, = Tz, it now follows that 


[roc T (u) FATE Ty (u) zt T y. (eg). 
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Therefore, un = Ti(u*eg,) is a measure from 9(R) with u = 
fia e e e s 3 


The normal distributions v4, are not the only stable probability 
measures on R, as is shown by the above examples. Nonetheless, they can 
be characterized by a simple condition in addition to stability: 


9.4.3. Theorem. A measure u € 9I (R) is a normal distribution if 
and only if it is stable and if its variance 


V(u) = f(x — fau(dz))? u(dz) (9.4.2) 


is finite and positive.!? 


Proof. By what was shown in Section 4.4, (6), we need to prove 
only that a stable measure u € 9I! (R) with variance 0 < V(u) < +œ 
is a normal distribution. If we set a = [u(dz) and e = V/V (a), then 
z— T(x) = (x — a) is a mapping from 3, that is, T(u) € 3(u). 
Obviously, fa dT(u) = 0 and V(T(u)) = 1. For the proof we can thus 
assume that fau(dz) = 0 and V(u) = 1. Then due to the stability of y, 
there is a transformation xz — T(x) = rx + 8 from 3 such that u * u = 
T(u). Since fa(u * u)(dx) = 0 and IE * u)(dz) = 2, we must have 
B = Oandr = v2. Therefore, for the Fourier transforms of the measures 
we are considering we have, by (8.1.10), 


[4(2)? = a(V22), — forallz ER, 


or the equivalent functional equation 


wE] «e» oa) 


By Theorems 9.3.2 and 9.4.2, A(x) Æ 0 for all x E R;" consequently, by 

Corollary A.2, there is exactly one continuous mapping e: R — C such 

that ¢(0) = 0 and à = e*. Since à is twice continuously differentiable by 

Theorem 8.3.3, this is also true for e by Corollary A.3. By Theorem 

8.3.3 &/(0) = 0 and 2” (0) = —1, that is, &/(0) = 0 and e"(0) = —1. 
From (9.4.3) it follows that 


on mg mg) 40 


10 The integrability of z — x? and the positivity of V(u) are equivalent to this. 


11 This also follows directly from (9.4.3). A(x) = 0 would imply a(x /~/ 27) = 0 for 
alln =1,2,....Forn— © this yields a contradiction in the form 4(0) = 0. 
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for all x E R and n = 1, 2, ... . Therefore, for x z 0 it follows that 


aod en I Er, 
= el—— for alls: 1, 2) « 
x v/2 Vo 


and hence for n — œ we have the equality e(z)/z? = $9"(0) = —4 by 
Taylor's formula. Therefore, g(x) = —2?/2, that is, u(x) = e”? for all 
x E R, and thus u = voi. J 


The following corollary is essentially a translation of this theorem into 
the language of probability theory: 


9.4.4. Corollary. Let X and Y be independent real random variables 
with the same distribution u, expected value 0 and finite positive variance. 
Then u is a normal distribution if and only if y is also the distribution of 


(1/v9) X. + Y). 


Proof. Let u be the distribution of (1/V/2)(X + Y). By suitable 
normalization we can assume that V(X) = V(Y) = 1. But then x — 
a(x//2)2 is the characteristic function of (1/V/2)(X + Y), and thus 
itz) = &(z/N/2)? for all x € R. Therefore we again arrive at (9.4.3). 
Hence the above proof shows that u = voi. The converse follows im- 
mediately from Section 5.3, Example 3. J 


Remark. With the help of further theorems from Fourier analysis, it 
can be shown that for every a € ]0,2], there exists a measure we € 9X! (R) 
having a continuous density fe with respect to A! such that 


fi. (x) = ele!" (x E R).? 


It follows from this form of the Fourier transform that ua is stable. The set 
of Fourier transforms of all measures from 3(ua) is indeed given by 


x — eBzg-1?lzlo COBE R) 


and thus is obviously stable with respect to ordinary multiplication. The 
functions fe are called stable symmetric densities of order a, 0 < a S 2. 
Foro = 1 (a = 2) we obviously have wi = yı (uz = vo), that is, a Cauchy 
(normal) distribution. 


PROBLEM 


Let u be a stable distribution on R. Determine the continuous con- 
volution semigroup (u:):cr, of probability measures on R which satisfies 
Hi = p. 


s 


12 See Meyer [39] and Blumenthal-Getoor [24]. 
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CONDITIONAL EXPECTATIONS 


This chapter deals with the concept of conditional expectation which 
is basic to modern probability theory. The approach to it is based on the 
Radon-Nikodym Theorem of Section 2.9. Our considerations again refer 
to an arbitrary probability space (Q,9f, P). 


10.1 CONDITIONAL EXPECTATIONS AND PROBABILITIES 


We consider an integrable real random variable X on 2. As we know, X 
can be viewed as a mathematical model for describing an experiment 
with random outcome in which real measurements are observed. In the 
sense of this interpretation, X then contains complete information with 
regard to the observed outcome of the experiment. On the other hand, 
the expected value E(X) contains only an insignificant amount of informa- 
tion about X. If we also interpret E(X) as a real random variable, namely 
the constant variable w — E(X) on Q, then E(X) is measurable with 
respect to every o-subalgebra 3 of 9[,! and in particular with respect to 
the smallest o-subalgebra which contains only @ and Q as elements. On 
the other hand, X is -measurable and in general not measurable with 
respect to smaller c-subalgebras of X. We can therefore think of measuring 
the amount of information about X contained in a real random variable Y 
by o-subalgebras 3 of Y with respect to which Y is measurable. 

In this regard, the concept of conditional probability treated earlier 
(Section 4.2) now also has a suitable place. For this, let Q = V B; bea 

i€ 

1 By this we mean, of course, every o-algebra 3 in Q with 3 C 9f. 
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decomposition of Q into finitely many (J = {1, . . . ,n]) or countably 

infinitely many (I = N) pairwise disjoint events B; € Y with probabili- 

ties P(B;) > 0. Then the conditional probability under the hypothesis 5; 

is the following probability measure Pg, on X: 

BOB 
P(B) P(B) 


for all A € 9(; that is, we have 


1 
P(A) = J rate, aP = ug; HPA 


1 
Ps, ——— (In P). 10.1.1 
5,7 BB) (15,P) ( ) 
The expected value of X with respect to this probability measure, that is, 
the number 


1 
B(X) = B, —- | XdP 10.1.2 
Es (X) | Xar, PB) ie ( ) 


is called the expected value of X given B; (i € I). Thus we arrive at a new 
random variable 

Xo= ) Es(X)1s, (10.1.3) 

i€r 

which, on every B;, is constant, equal to Hz,(X). In the case J = {1}, this 
is just the constant random variable w — E(X), because Bı = Q. The 
variable Xo is measurable with respect to the o-subalgebra 3 = 
3((B;; 7 € I) generated by all the B; i € I. It consists of all sets of 


the form V B; where J runs through all elements of the power set 3B(I). 
icJ 

Now we have f5,Xo dP = Eg(X)P(Bj) = [x.X dP for all z € I. Due 

to the special form of 3 it then follows that 


ip Yop eS (FE XdP, forall ZEB. (10.1.4) 


This elucidates the close relationship between X, and X, to be expected 
from the construction of X, through the o-subalgebra 3. If instead of X, 
we choose the random variable w — E(X), then (10.1.4) is satisfied for all 
events Z of the e-subalgebra | 2,0] relative to which E(X) is measurable. 
When X, = X, (10.1.4) holds for all Z € 9f. 

We shall now see that in the three given cases, the random variable X» 
is P-almost surely uniquely determined by the measurability relative to 
the o-subalgebra 3 being considered and by the validity of (10.1.4). The 
special form of 3 even is without significance for this. Through the con- 
sideration of o-subalgebras 3 of X we thus arrive at random variables Xo 
which are more or less closely related to X. We can interpret the o-algebra 
& as a measure of the information contained in X, with regard to X. 
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10.1.1. Theorem. Let X be a numerical random variable on (Q,9[, P) 
which is 20 (or integrable). Then for every o-subalgebra 3 of 9( there 
exists, up to almost sure equality, exactly one nonnegative (or integrable) 
numerical random variable X, on € which is 3-measurable and satisfies 
the condition 


d uP a XdP,  foralZc 3. (10.1.4) 


If X is integrable and 20, then X, = 0 almost surely. 

Proof. First suppose X = 0. We use P, and Q to denote the restric- 
tions of the measures P and XP, respectively, to 3. Then P, and Q are 
measures on 3; Po is a probability measure and is thus finite. Since 


QZ) = |, X dP, 


Q(Z) = 0 for every Z € 3 with Po(Z) = P(Z) = 0, that is, Q is Po-con- 
tinuous. Therefore, by the Radon-Nikodym Theorem, Q has a density 
relative to Po, that is, there exists a 3-measurable numerical function 
X, 2 0 on Q such that Q = XoPo, that is, such that [zXodPo = [zX dP 
for all Z € 3. By the definition of integrals of measurable functions = 0 
we have Sf dP) = ff dP for every 3-measurable function f 2 0. There- 
fore {7X aPo = [zXo dP and hence fzXodP = fzX dP for every 
Z € 3. The random variable X, thus yields the desired result. Moreover, 
by Theorem 2.9.9, X, is a density and is therefore Po-almost surely 
uniquely determined. Since the set (Xo = X5] lies in 3 for every 3-meas- 
urable numerical random variable X7, it follows that Xo = X, P-almost 
surely for every 3-measurable random variable X; = 0 satisfying con- 
dition (10.1.4). 

If X is integrable and of arbitrary sign, we decompose X into its posi- 
tive part X* and negative part X-. From what has just been proved, 
follows the existence of 3-measurable numerical random variables Xj = 0 
and Xj * = 0 such that 


[xtap- [,X*4P and ie X?*àP = [, X- dP, 


for all Z € 3. If in particular we set Z — Q, we see that since X* and 
X- are integrable, X? and Xj" are also integrable and hence almost 
surely finite. Thus we can assume X5, X5* to be real-valued without loss 
of generality. But then Xo = Xj — X," is an integrable solution of our 
problem. For every other integrable solution Y, we have, for arbitrary 


Z € 8, 
f, Gt Yar = f, Q* + YO ar; 
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hence, (X7 + Y>)Po = (X¢* + Y%)Po due to the 3-measurability of all 
integrands. By Theorem 2.9.9 again, we thus have Xj + Y, = Xj* 
+ Yi Po-almost surely and therefore also P-almost surely. Hence we 
obtain Xo = Yo P-almost surely. Finally, if X is integrable and 20, then 
we still have to show that Xo 2 0 almost surely. But by (10.1.4), 
fzXo dP = 0 for every Z € 3, and in particular for Z = {Xo < 0}. For 
this Z then, f2Xo dP = 0 and hence P(Z) = 0 since X, is strictly nega- 
tiveon Z. 1 


10.1.2. Definition. Under the conditions of Theorem 10.1.1, the 
random variable X, is called the conditional expectation of X given 3; in 
symbols; 


E9(X) = E(X | 3) = Xo. (10.1.5) 


Hence E9(X) is a 3-measurable numerical random variable such that 
[^ E3(X) dP = iB XdP, forall ZEB. (10.1.4’) 


E9(X) is only P-almost surely uniquely determined by (10.1.4’). We 
therefore also speak of different, but P-almost surely equal verszons of the 
conditional expectation. Therefore, results on conditional expectations in 
general hold only P-almost surely. The transition from X to E(X) can 
be interpreted as a ‘‘smoothing” of X relative to 3. This interpretation 
is primarily suggested by properties (10.1.16) and (10.1.17), still to be 
proved. 


If (Yjier is a family of random variables Y;: (Q,9() — (Q;,9() with 
values in measurable spaces (Q59(), ? € J, and 3 = 9((Y;; 7 € I) is the 
o-subalgebra generated by these random variables, then we also write 


ECXe€n(X)- E(X | Y4i € I) 


for E9(X) and speak of the conditional expectation of X given (Y ier or 
the conditional expectation of X for given random variables Y;, i € I. If I 
consists of only n elements: J = (1, . . . ,n}, then we also write 


IEE Ya(X) or PAI Yie a et E 


Examples 


The three introductory, motivating examples tell us the following in 
the new terminology: 


1. If 31s the c-algebra consisting only of @ and Q, then E9(X) = E(X) 
almost surely for every admissible random variable X on (0,9, P). (We say 
that X is admissible if X is 20 or integrable.) 
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2. E"(X) = X almost surely for every admissible random variable X. 


3. If 8 is generated by a decomposition Q = V B; of the sets B; € % as 
iEI 
above (I finite or countably infinite), then 


E9(X) = ) E g;(X)15n,, P-almost surely 
iEI 
provided P(B;) > 0 for all z. If we drop this condition, then the above 
equality still holds if we take E 5,(X) to be some arbitrarily chosen number 
for events B; with P(B;) — 0. Here X is again an admissible random 
variable. 


4. Let (Q,9,P) be a probability space and G a group of finite order n of 
measurable mappings g: (Q,9() — (Q,9() each of which leaves P invariant, 
that is, we have g(P) = P for all g € G. The system 3 of all sets Z € A 
which are G-invariant, that is, for which g(Z) — Z for arbitrary g € G, 
is a o-Subalgebra of 9(. Then for every admissible random variable, 


1 
E9(X) = * » X og, P-almost surely 
gcG 


Indeed, X, = (1/n)Z,e¢X og is B-measurable, since X, is ?(-measurable 
and Xoo h = Xo for all h E G. Moreover, for every Z € 8, 


ETERA ESLER] X dg(P) a 
Jz n Z n g(Z) Z 


gcG gEG 
Remark. 1. . For numerical random variables X on (Q,9(, P) for which 
either X+ or X-is integrable, that is, the integral IX dP exists in the sense 
of the remark following Definition 2.4.1, the conditional expectation can 
still be defined easily. The details are Jeft to the reader. 


The following properties (10.1.6) — (10.1.10) of conditional expectation 
can be obtained directly from the definition and from Theorem 10.1.1. 
Therefore we omit the proof. Here X and Y are always numerical ran- 
dom variables on (0,2,P) which are both 20 or both integrable. 


E(E9(X)) = E(X), (10.1.6) 

X 3-measurable = E(X) = X, almost surely, CIO TT) 

X = Y almost surely = E(X) = E9(Y), almost surely, (10.1.8) 
X = const = a = E?(X) = a, almost surely, (10.1.9) 


E8(aX + BY) = aE9(X) + BE9(Y), almost surely 
(a, B e R, Or a, B c R), (10.1.10) 
X < Y almost surely = E(X) € E9(Y), almost surely. — (10.1.11) 
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Because of (10.1.8), we can assume X X Y on all of Q. But then there is a 
random variable Z = 0 with Y = X + Z, and the desired property 
follows from (10.1.10) since E9(Z) = 0 almost surely. 


|E3(X)| x E9(|X], almost surely. (10.1.12) 


This is obvious in the case X = 0. For integrable X, we decompose X 
into positive and negative parts. 
For every isotone sequence (Xn)nen of random variables X, 2 0. 


sup £3(X,) = E3(sup Xa), almost surely. (10.1.13) 


By (10.1.8) and (10.1.11), we can assume that the sequence (£3(X,)) is 
isotone. The assertion then follows by a passage to the limit in (10.1.4) ° 
using the Monotone Convergence Theorem. 

If a sequence (X,),cw of numerical random variables converges almost 
surely to X and there exists an integrable random variable Y with |X,| € Y 
for all n, then 


lim E3(X,) = E9(X), almost surely. (10.1.14) 
Indeed, introduce X7 = sup X, and X** = inf X,; then —Y< 


kzn kzn 
X** < X, S X* < Y fo al n EN. Furthermore (Y — X*) and 
(Y + X**) are isotone sequences of integrable random variables with 
Y — lim sup X, resp. Y + lim inf X, as supremum. Since (Xn) con- 


verges to X almost surely, the almost sure convergence of (E?(X7)) and 
(E3(X**)) to £3(X) follows from (10.1.10) and (10.1.13). Hence the 
inequalities X** € X, < X7 together with (10.1.11) imply lim E9(X,) = 
E?(X) almost surely. 

A simple reformulation of the definition of E9(X) yields: 


10.1.3. Lemma. A 3-measurable numerical random variable Xo on 
Q 1s the conditional expectation of a numerical random variable X which 
is 2O (or integrable) if and only if 


TOX aps Tox p (10.1.15) 


for all 3-measurable random variables Q on Q which are Z0 (or almost 
surely bounded). 


Proof. (10.1.15) follows from (10.1.4) when we choose indicator func- 
tions of events Z € 3 for Q. If X z 0, then we obtain (10.1.15) from 
(10.1.4) for all 3-elementary functions Q on Q. A passage to the limit 
yields the assertion. If X is integrable, we decompose it into its positive 
and negative part. J 
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Now we obtain the following smoothing properties of conditional 
expectation. 

Let X and Y be both Z0 or let X be almost surely bounded and Y 
integrable. Then 


X 3-measurable = E9(XY) = XE9(Y), almost surely. (10.1.16) 


Suppose we are in the first case. If Q = 0 is 3-measurable, then by the 
above lemma it follows that [QXY dP = {QXE3(Y) dP since QX is 3- 
measurable. But on the other hand, JOXY GP = [QE3(XY) dP. Due to 
the 3-measurability of XE9(Y), it now follows that E3(XY) = XE9(Y) 
almost surely. We proceed analogously in the second case. 


Under the hypotheses of (10.1.16), 
E*(YE9?(X)) = E(Y)E3(X), almost surely. — (10.1.17) 


This follows from (10.1.16) if E9(X) plays the role of X there. By (10.1.9), 
(10.1.11), and (10.1.12), when X is almost surely bounded, E?(X) also is: 


For c-algebras 31, 3» in 2, we have 


Bı C 8» C A = E9(E?(X)) = E9(E?'(X)) 
= ES9(Xx), almost surely. (10.1.18) 

E>'(X) is 3;-and thus 3;measurable; by (10.1.7) it then follows that 
E*'(E*?(X)) = E?'(X) almost surely. Moreover, | z E?'(X) dP = fz X dP 
for all Z € 3» and fz E?(X) dP = fz X dP for all Z € 3i, hence, 
fz E*(X) dP = fz E*(X) dP for all Z € 31. Hence it follows that 
E9?E9(X)) = E9?'(X) almost surely. 

As a last property, we discuss the behavior of conditional expectation 
with respect to independence: 


10.1.4. Theorem. Let 3; and 3» be o-subalgebras of A, 3 = (31,32) 
the o-algebra generated by 31 and 3», and X an integrable random varia- 
ble. If X and 31 are independent of 32, that is, the c-algebra 9((9((X), 31) 
generated by X and 31 is independent of 35, then 

E9(X) = E9(X), almost surely. 


Proof. Let Xo be a version of E?'(X). This is then 3-measurable and, 
as we shall show, is also a version of E(X). Thus we have to verify that 
fz A odP = fz X dP for all Z € 3. Because of the integrability of X, the 
system D of all Z € 3 for which [zX» aP = fzX dP is a Dynkin sys- 
tem. By Theorem 1.2.3 it thus suffices to prove the equality for all sets 
Z of an /^-stable generator of 3. Such a generator is given by 


G = {Z1 N Z:: Zı E 8n Z2 E 81. 
For every set Zi O Z2 € Œ we now have 
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jn XodP = E(1z,1z,X0) = E(12)E(17,X9), 


since 1z,Xo is 3:-measurable, 17, is 3»-measurable, and the o-algebras 
31 and 3» are independent by Section 5.1, Consequence 2, p. 151. Fur- 
ther, by the definition of Xo, 


E(1g X) = E(1z,X) 
and hence 
jn XodP = E(1z) E(1, X). 


But 1z, and 1z,X are independent random variables by hypothesis. 
Another application of the Multiplication Theorem 5.3.1 now yields 


jode XodP = E(17,17,X) = | X dP. 


But this is what we needed to show. J 


10.1.5. Corollary. Ifan admissible random variable X is independent 
of a o-subalgebra 3 of Y, then 


E9(X) = E(X), almost surely. (10.1.19) 


Proof. Forintegrable X this follows from Theorem 10.1.4 if we choose 
Bi = {P,Q} and 3» = 3. But the assertion also holds for nonintegrable 
random variables X = 0, since, by the Multiplication Theorem, IE XP = 
E(lzX) = P(Z)E(X) = f;E(X)dP forall ZEB. 1 


By a simple specialization, we finally arrive at the general concept of 
conditional probability: 


10.1.6. Definition. Let (Q,9(,P) be a probability space, 3 a o-sub- 
algebra of ?(, and A € A an event. Then 


P*(A) = P(A | 3) = ESG) (10.1.20) 
is called the conditional probability of A given 3. 


Thus P(A) is an integrable 3-measurable function, Z0 almost surely 
for which 


ie P3(A) dP = iP 14 dP = P(A f^ Z) (10.1.21) 
for all Z € 3. Hence P(A) is almost surely uniquely determined. 
Example 


5. Suppose we have the situation of Example 3. Then obviously 


P3(A) = X Ps(A) 12, almost surely. 
iCI 


CONDITIONAL EXPECTATIONS 309 


For events B; with P(B;) = 0 we can again choose Ps,(A) arbitrarily. 


By specializing properties of conditional expectation, we obtain, in 


particular, the following properties of conditional probability: 


Os PA) = 1. almost surely. (10.1.22) 
P3(@) = 0, almost surely; P(Q) = 1, almost surely. (10.1.23) 
Ai C Az (A; € A) = P3(Ay) < P?(A,), almost surely. (10.1.24) 


For every sequence (An)nen of pairwise disjoint events from 9f, 


n=1 


RNG A.) = p P9(A,), almost surely. (10.1.25) 
n=1 


Here (10.1.22)-(10.1.24) follow from (10.1.9) and (10.1.11). The last 
property follows from (10.1.14) and (10.1.10). 


Remark. 2. The properties given here do not say that the function 


A — [P3(A)](w) is a probability measure on X for almost all w € Q; because 
in each of the four given properties, we have null sets depending on the 
given events, for example, in (10.1.25) depending on the sequence (An). The 
union of these generally uncountably many null sets is usually not a null 
set. (See Problem 2.) 


PROBLEMS 
1. Consider the probability space (R?,B?,P), where PP =u 8 ::: Gu 
and where u € M(R). Let 3 be the system of all Borel sets B € 38» 
such that for every permutation 4 . . . , 2,0f 1, . . . , p and every 
point y = (ty ver, B' we have a «v5 € D. Prove: 
3 is a o-subalgebra of 38?, and 
1 
zr —>— Xp e a al E) 
p! 
AAAS ip 
(summation over all permutations of 1, . . . , p) is a version of 


E(X | 3) for every admissible random variable. 

Let (Q,9(,P) be the probability space Q = [0,1], A = 2 O B, P = Xs. 
Let f: A — Q be the following mapping: f(A) = sup A for A 4 Ø 
and f(@) = 0. Define Q: Q X A — R to be 


Q(w,A): = la(w) + luco (o). 


Prove: w — Q(w,A) is a version of P(A |X) for all A € X. There 
exists no P-null set N € % such that A — Q(w,A) is a probability 
measure for allo € CN. 
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3. Let X and Y be real random variables on a probability space (Q,9f, P), 
and let 3 be a o-subalgebra of X. Assume X E £7(P) and Y € £4(P), 
where 1 € p < +œ and p^! + q^ = 1. Prove: 

(a) |g3(XY)| x E9(|X|») vvES(|Y|2)!*, almost surely 
(b) |E3(X)|» x E9(|X|»), almost surely 

4. Let 3 and 3’ be o-subalgebras of the c-algebra Y of a probability 

space. Prove: 3 and 3' are independent if and only if 


P(Z' | 3) = P(Z), almost surely 


for all Z' € 3'. Formulate the corresponding result for the independ- 
ence of two random variables. 

5. Let (Q,9(P) be a probability space and let Y be a real, integrable 
random variable 2 0 on it. Let 31 and 3» be o-subalgebras of Y, and 
denote by 3; the c-algebra ?((31,3») generated by 31 and 3». Prove 
the equivalence of the following two statements: 

(a) E(Y | 33) = E(Y)| 33), almost surely 
(b) E(XY | 33 = E(X | 3) E(Y | 33), almost surely 
for all 3-measurable real random variables X z 0. 


10.2 FACTORIZATION OF CONDITIONAL EXPECTATION 


The connection between conditional expectation relative to a c-sub- 
algebra 3 of Y as just explained and the elementary concept of conditional 
expectation relative to an event of positive probability is closer than one 
would first expect. This shows up clearly when 3 is the c-algebra N(Y) 
generated by a random variable Y. The reason for this is shown by the 
following lemma, which gives 9((Y)-measurability a very intuitive 
meaning. 


10.2.1. Lemma. Let Y:Q —Q be a mapping of a set Q into a 
measurable space (2,%’) and let Z: Q > R be a numerical function on Q. 
Then Z is measurable with respect to the c-algebra N(Y) in Q generated 
by Y if and only if there is a measurable numerical function g on (QW) 
such that 


Z-9el. (10.2.1) 


Proof. IfZisof the form Z = go Y, then Z, as the composition of an 
3((Y)-9-measurable mapping and an 3(-38'-measurable mapping, is 9((Y)- 
%'-measurable. For the proof of the converse, we distinguish several cases: 


1. Let Z = 22, ala, be an 9((Y)-elementary function, that is, A; € 
X(Y) anda; € Ry ($ — 1, .. . , n). For every A; there is a set A; € W 
with A; = Y-!(A)). Therefore g = E} aila, yields the desired result. 
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2. Suppose Z 2 0. Then there is an isotone sequence (Z,)ncn of 9((Y)- 
elementary functions with Z = sup Z, and, according to the first case, for 
every Zn there is an -elementary function gn with Zn = gn Y. But then 
g = sup gn yields the desired result. 

3. In the general case we decompose Z into the positive part Z+ and the 
negative part Z-. By case 2, there are 2{’-measurable functions gj = 0 and 
go = 0 on Y with Z+ = gto Y and Z- = g} oY. The difference g/(w’) — 
go (w’) is not defined on the set U' = (gy = +œ} (gy = +}. But 
the set Y (0) is disjoint from U’, since Z (w) = Z*(e) — Z~(w) = g«(Y(e)) — 
go (Y(w)) for all w € Q. Therefore, if we set 


g = leego, and — g" = lego, 
then g = g' — g” yields the desired result. J 


Remark. The restriction of gto Y (9) is uniquely determined. For every 
w € Y(Q), we have g(w') = Z(w) for all o € Q with Y (w) = w’. Therefore if 
Y(Q) = Q or at least Y (Q) € W, then we can obtain a function g by con- 
structing the uniquely determined restriction on Y(Q) according to the 
introductory statement and defining, say, g(w’) = 0 for all w’ E CY(Q). 
Here we can also replace (R,8') by an arbitrary measurable space in 
which the one-point subsets are measurable. The lemma is therefore note- 
worthy insofar as we can get along without the measurability of Y(Q).? For 
this the special structure of (R,B2) is used in an essential way. 


We apply the lemma to the following situation. Let X again be a 
numerical random variable on (2,2,P) which is either 20 or integrable 
and let Y: (2,2) — (Q’,2’) be an (Q^,9()-random variable on Q. Since the 
numerical random variable EY(X) is 9((Y)-measurable, then by Lemma 
10.2.1 there is a measurable numerical function g on (’,%’) with 


EY(X) = ge Y; (10.2.2) 
With the aid of the distribution Py of Y we can characterize g, uniquely 
determined only when Y (Q) = ', as follows: 


10.2.2. Theorem. Every (measurable numerical function g with 
property (10.2.2) satisfies the equality 


— / / 
9 Py = f uu XdP, forall A’ € 9 (10.2.3) 
and is therefore Py-almost surely uniquely determined. 
2 As an example, for a continuous mapping f: E — F of a Polish space E into a 


Polish space F, the image f(E) need not be Borel in F. See the theory of so-called 
Souslin or analytic sets (Bourbaki [26] and Meyer [39]). 
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Proof. For every A’ c W, 


PEEL = if lag dPy = ih (lave Y)(g o Y) dP 
= fao EdP = [Lui BUD AP = fuu XP. 


The last equality is obtained from the definition of conditional expectation. 
Now if h is another 9(/-measurable function with EY(X) = ho Y, then, by 
the above, fag dPy = fah dPy for all A’ € W. Now either X = 0 or X 
is integrable. In the first case, as in the proof of Theorem 10.1.1, (10.2.3) 
implies that g 2 0 and h 2 0 Py-almost surely. Therefore, by Theorem 
2.9.9, 9 = h Py-almost surely. In the second case, g and h are P y-integrable 
by the Transformation Theorem, since EY(X) =goY — hoY is 
P-integrable. Decomposition into positive and negative parts then yields 
fagt + h-) dPy = fa(g- + ht) dPy for all A’ CW’, that is, gt + 
h- = g- + ht Py-almost surely by Theorem 2.9.9. Since g and h are 
Py-almost surely finite, we have that g = h Py-almost surely. J 


Now if we assume that the set [y] is an event in 9l" for some element 
y € Q’, then from (10.2.3) for A’ = {y} it follows that 


qQ)P(Y - y) - f... X dP. 


Therefore, if P{Y = y} = Py({y}) z 0, this equality implies that 


1 
WW) = py yj fp, ZP ED 0024 


and hence, by (10.2.2), < 
EY(X)(w) = Eira (X), for all o E (Y = y]. (10.2.5) 


Thus, on {Y = y] with P{Y = y] > 0, every version of the conditional 
expectation EY(X) is constant and equal to the conditional expectation of X 
given (Y = y}. 

Thus, we see that the situation of Example 3 of Section 10.1 was not as 
special as it seemed at first. If, there, we let Q' = I and W = B(Q’) as 
well as Y(w) = i for o € B; (i € I), then Y is an (Q’,Y’)-random variable 
with 3 = A(Y). Then, further, g(?) = Hz,(X) for alli € I with P(B) > 0. 

In general, however, P{Y = y} will be equal to zero for most points 
y € Q’. For example, we can think of the case of a real random variable Y 
whose distribution Py has a density with respect to Lebesgue measure. 
Nonetheless, the value of g at an arbitrary point y € Q' is then still at our 
disposal. For y € Y(Q), this is even uniquely, determined by EY(X). In 
the general case, we can interpret g(y) as the mean of the values which X 
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attains at elementary events o C Q satisfying Y(w) = y. Therefore we 
define: 


10.2.3. Definition. For every y E Q’, g(y) is called the conditional 


expectation of X given that Y equals y; in symbols, 


EY-¥(X) = E(X|Y = y) = gly). 


Thus, while EY (X) is a random variable, EY—v( X) is a number. By (10.2.2), 
from y — E¥="(X) we again obtain the conditional expectation EY(X) in 
the form w — EY-Y(»(X). 


PROBLEMS 


T. 


Let X and Y be random variables on (Q,9(,P) with the properties 
mentioned in connection with (10.2.2). Prove that E(X | Y = y) 
can be defined as follows: Up to equality Py-almost everywhere, 
y — E(X | Y = y) is the only 3(-measurable numerical function on 
Q' which is 20 for X 2 0 or Py-integrable for integrable X and 
satisfies 


fe BLY = Pr) = feign XAP, 


for all A’ c XW. 

Let Y be an (Q',9()-random variable on (Q,9(, P), and let X1 and Xs 

be numerical random variables on Q which are both 20 or both 

integrable. Prove that the following equalities hold for Py-almost 

ally EQ’: | 

(a) X; = const => E(X,|Y = y) = const; 

(b E(aX1 + BX2|¥ = y) = aEQG | Y = y) + 8EQG| Y = y) 

(a, 8 Z 0 oro, BER); 

(co) X; S X, almost surely > E(Xi| Y 2y) € E(Xi| Y = y); 

(d) for every increasing sequence (Xn) of random variables X, 2 0 
one has 


sup E(X,| Y = y) = E(sup X.| Y = y), P y-almost surely. 
nEN 


Let Y be an (Q",9(^)-random variable on (9,36, P). Then E(14| Y = y) 
is called the conditional probability of A € XA given that Y equals 
y € Q'. It is denoted by P(A | Y = y) = PY—(A). Prove: 
(a) P(AN Y € 4") = JeP(A |Y = y)Pr(dy) 

CIL A E 
(b) P(O|Y =y) =0, Py-almost surely; 
(e). POLY zm y) =1, Py-almost surely; 
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oo 


(d) P (U Aa|Y = y) = » P(A,|Y =y),  Py-almost surely 
n=1 


n=1 


for every sequence (Az) of pairwise disjoint events A, C 9f. 


10.3 KERNELS, EXPECTATION KERNELS, 
AND CONDITIONAL DISTRIBUTIONS 


Remark 2 at the end of Section 10.1 raises the question of the assump- 
tions needed on the probability space (Q,9(, P) and a given o-subalgebra 
3 of A so that there exists a version P(A) of the conditional probability 
for which A — [P3(A)](w) is a probability measure on Y for every w € Q. 
In other words, we are interested in the question of the existence of a func- 
tion P3: Q X 9( — R, associated with 3, such that w — P3(w,A) is a version 
of P3(A) for each A € Y and A > P3(w,A) is a probability measure on 9f 
for each w € Q. We then call Pa an expectation kernel for 3. Here we 
encounter the important concept of kernel for the first time. 


10.3.1. Definition. Suppose we are given two measurable spaces 
(2,2) and (Q',9(). A kernel from (Q,90) to (Q",9(^) or, briefly, from Q to 0’, 
is a numerical function K on Q X Y with the following two properties: 


w — K(w,A’) is -measurable for every A’ € W; (10.3.1) 
A’ — K(w,A’) is a measure on W’ for every w E Q. (10.3.2) 


A kernel K is called Markov [sub-M arkov] if K(w,Q’) = 1 [K(c,Q') < 1] 
for all w € Q. 

If Q =Q and Y = A’, then we also speak of a kernel on (Q,90) or, 
briefly, on Q. 


For a Markov kernel, every measure A’ — K(w,A’) is thus a probability 
measure. An expectation kernel for a o-subalgebra 8 of Ñ [and a given 
probability space (Q,9(,P)] is thus a Markov kernel from (Q,3) to (Q,90). 
The importance of the concept of kernel is emphasized by the following: 


Examples 


1. Let e: (Q,9() — (Q3) be a measurable mapping of a measurable space 
(Q,9() into a second measurable space (Q,9(^). Further, let u be a measure 
on A. Then 


K(o,A') = e(u)(A'" = ule (A)) — (e EQ, A’ € 9) 


defines a kernel (independent of w) from (9,90) to (Q,9). Every image 
measure is a kernel in this sense. 
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2. Suppose (mij)&5cwxw iS a double sequence (or, in view of applica- 
tions, a matrix with countably infinitely many rows and columns) with 
numbers 0 S m;; S +% as elements. On the measurable space (N,99(N)) 
of the natural numbers (with discrete topology), this matrix defines the 
following kernel: 
KG4)- ) my  GCN,A€Q). 
JEA 

In this way we obtain a bijection of the N X N-matrices onto the kernels 
on N. The matrix associated with a kernel K on N is obviously the matrix 
(rj) where m; = K(i, {7}) for all (42,7?) EN XN. K is Markov (sub- 
Markov) if and only if all row sums Zjeymi; = 1 (<1).’ 

[There are many possible interpretations of this example: In the 
Markov case, (mą) might describe the motion of a particle wandering at 
random in the set N of natural numbers. Then 7;; is to be interpreted as 
the probability that the particle passes from "state" i C N to state 
TENA 


3. Let (Q,%,u) be a measure space and k = 0 a numerical, 2 & 9(-meas- 
urable function defined on Q X Q. Then 


K(w,A) = f, bw," )u(do’) 
is a kernel on (0,2). 
This example is basic for potential theory: There Q = R?, A = B?, uis a 
Borel measure on R?, and for dimension p 2 3, the function k is the 
Newtonian “kernel” 


1 
k(x,y) = la z^ y|P-? 
+o, x= y. 


4. Let (2,2,P) be a probability space. Then for the o-algebras 
8-238»-21(09] and 3-9 


there always exists an expectation kernel. Indeed, Pg, is the kernel 
(w,A) > P3,(w,A) = P(A), independent of w, and Py is the kernel which 
associates the unit mass e, at w with each w € Q, that is, Py(w,A) = la(e) 
for all (w,A) E Q X YX. 

The last kernel (w,A) — la(w) = €,(A) is called the unit kernel on Q 
or (Q,90) and is denoted by J. Thus we have 


1, wEA 


Ilo A) 0-c we DA 


(wA) € Q X A. 


3 Then we also say that the matrix (m;;) is stochastic (sub-stochastic). 
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5. Suppose we have the situation of Section 10.1, Examples 3 and 5. Then 


P3(w,A) = Y. Ps(A)1n(o) 
icr 
is an expectation kernel for 3. Indeed, every w C Q lies in exactly one B;. 
Therefore A — P3(w,A) = Pz,(A) is a probability measure. For events 
B, Æ Ø with P(B;) = 0, we have to choose some probability measure on 
N for Pr, for example, the unit mass ¢,, of an arbitrarily chosen w; € B;. 


6. Suppose we have the situation of Section 10.1, Example 4. Then 


1 
Pao) = 5 9, vol) 
gEG 


is obviously an expectation kernel for 3. 


Example 2 suggests the following interpretation for a kernel K from 
(Q,90) to (Q^,9(): K describes a “diffusion process" which takes the unit 
mass e, at a point w € Q into the “mass distribution" on Q’ described by 
the measure A’ > K(w,A’). 

If u is an arbitrary measure on 9f, then this is taken into a measure yp’ 
on W by K as follows: u/(A") = {K(w,A’)u(dw), A’ E W. That y’ is a 
measure on X’ follows immediately from known properties of the integral. 
We denote u’ by uK and then write the equation defining „K in the form 


(uK)(A') = fu(de)K(s,A) = [K(sA')u(de). — (10.33) 


If &j (€x) denotes the set of all numerical, nonnegative N- (9j-) measurable 
functions on Q (Q^), then K also establishes a-mapping from 8; into-&g. 
For every f’ € &jp, w > {f’(w’)K(w,dw’) is a function in 8%. To see this, 
we need only refer to Theorem 2.3.6, and thus approximate f’ by ele- 
mentary functions. As usual, we denote the new mapping again by K, 
that is, we define 


(Kf’)(w) = JFK wdw) ^ (w EQ) (10.3.4) 


for every f’ € &y. In particular, for indicator functions 1,4, of sets A’ € W, 
we then have 


Kla (w) = K(w,A’) (v € Q). (10.3.5) 
In this new notation K1 = 1 (K1 S 1) means precisely that K is Markov 
(sub- Markov). 

Finally, we define Kf’ for 2{’-measurable numerical functions f’ on Q' by 
(10.3.4) provided f’ is integrable with respect to every measure A’ — 
K(w,A’), w € Q. Then Kf’ is a real ?(-measurable function on Q. If K i$ 
sub-Markov, then K can always be interpreted as a linear positive mapping 
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of the vector space & of bounded 9(-measurable functions on ©’ into the 
vector space &% of bounded 9(-measurable functions on Q. 

For a kernel K from (Q,90) to (Q’,%’), the mapping K: 8p — &% is 
obviously additive, positive-homogeneous, and Daniell-continuous, that is, 


sup Kf, = K sup,f,, (10.3.6) 


for every isotone sequence (f,) in &y. The latter follows directly from the 
theorem on monotone convergence. The convention of using K to denote 
the mappings of 8j» into 8j associated with K is justified by the following 
lemma. It concludes our consideration of general kernels. 


10.3.2. Lemma. Let (0,9) and (Q',9) be two measurable spaces. 
Then for every additive, positive-homogeneous, Daniell-continuous 
mapping N: &y; — &g there exists exactly one kernel K from (Q,2{) to 
(A) such that N(f’) = Kf’ for all f' € Eş. 


Proof. Because of (10.3.5), only K(w,A’) = [N(14:)](e) can be con- 
sidered as a definition of K on Q X W. We can verify immediately that K 
is a kernel which yields the desired result by Theorem 2.3.6. J 


Example 


7. Let I be the unit kernel defined in Example 4 on a measurable space 
(0,90). Then obviously 


If —-f and ul =p 


for all functions f € 8j; and all measures y on X. 


Now we turn back to the study of expectation kernels. If there exists an 
expectation kernel P3 for a probability space (Q,%,P) and a o-subalgebra 
3 of A, then this kernel plays the same role relative to conditional expecta- 
tion as P relative to ordinary expectation in the equality E(X) = ju qoa 
If X is a numerical random variable Z0, then for every version of E9(X), 


E9(X) = PgX, P-almost surely, (10.3.7) 


that is, w > [X(w’) P3(w,dw’) is a version of E(X). This is obvious for 
indicator variables X = 14 with A € A. Using Theorem 2.3.6, (10.3.7) 
follows in general by using known properties of conditional expectation. 
(10.3.7) also holds for integrable numerical random variables X. We need 
only decompose X into its positive and negative part and take into account 
the integrability of E9(X*) and E9(X-). 

Finally, we can also give an analog to the concept of distribution of a 
random variable: 
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10.3.3. Definition. Let X: (Q, A) — (0’,Y’) be a random variable on 
a probability space (Q,9(, P) with values in a measurable space (’,%’), and 
let 3 be a o-subalgebra of X. Then every Markov kernel Pxja from 
(Q,3) into (Q',9(^), such that w — Pxi8(«e,A') is a version of P8(X-1(A’)) 
for each A’ € Y’, is called a conditional distribution of X given 3. 


Examples 


8. If X is the identity mapping of a probability space (Q,9(,P) onto 
itself, then every conditional distribution Px,a is an expectation kernel 
P3 and vice versa. 


9. For the special o-subalgebras 3 = 3o = {0,2} and 3 = 9f, Pxia 
exists for every (Q',9(^)-random variable X on (Q,9(, P): Pxja, is the kernel 
(w,A’) — Px(A’), independent of w, where Px is the distribution of X. 
With every w € Q, Px x associates the unit mass ex(,) on Y’, that is, 


1, if X(») c A’ 


Pxja(w,A’) =e 0, if X(o) g A’ 


(w A) € Q X Y. 
(Compare Example 4.) 


10. If we again have the situation of Examples 3 and 5 of Section 10.1 
and that of Example 5 of this section, then Pxja exists for every (Q',9(^)- 
random variable X. Let Qs, = X(Pz,) be the distribution of the proba- 
bility measure Pg, (? € I). Then 


Pxia(w,A’) = ) Qa(A)In(o) (WAN CO x W 
(CI 


is a conditional distribution according to Example 5. The reader should 
determine the conditional distribution also in the case of Example 6. 


If we are dealing with the situation described in Definition 10.3.3, then 
for every 3('-measurable numerical function f' = 0 on 2’ and every version 
of the conditional expectation of f' o X: 


ES(f' o X) = Pxiaf', P-almost surely. (10.3.8) 
It again suffices to treat the case f' = 14: with A’ € W. But then 


E9(1,/« X) = E9(1xa45) = P3(X-(A’)) 
and 


Pxigle(o) = Pxig(o, A") = P9(X-1(A?))(o), P-almost surely. 


Formula (10.3.8) remains valid if we require the integrability of f' o X 
instead of f^ 2 0. Then, again, we need only decompose f’ into its positive 
and negative part. 
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Now we proceed to the question of existence and uniqueness of expecta- 
tion kernels and conditional distributions. By Example 8, it suffices to 
discuss these questions for the more general concept of conditional 
distribution. For the question of uniqueness, we show: 


10.3.4. Theorem. Let Pxia and PX|a be two conditional distributions 
of a random variable X: (0,90) — (Q^,9) on a probability space (Q9, P) 
given a c-algebra 3 C N. Then: 

(1) Pxi3(w,A’) = Pgo, A^), P-almost surely for every A’ € W. 


(2) If the c-algebra 2’ has a countable generator G', then there is a 
P-null set N € Y such that 


Pxi3(w,A’) = Piig(o,A', ^ forallo € CN and all A' E W. (10.3.9) 


Proof. (1) follows directly from the observation that for fixed 
A’ E W, w > Pxjig(o,A') is a version of P3(X-!(A’)) and that any two 
such versions coincide P-almost surely. 

(2) The countable generator ©’ of W’ can be assumed -stable without 
loss of generality. If necessary, we replace ($' by the system of all finite 
intersections of elements from ©’. By (1), for every E’ € G', there is a 
P-null set Nz E Y such that Pxig(o,E') = Pxja(o,E') for all w € CN. 


The countable union N = V Nr is then a P-null set such that 
FEY 


Pxjg(o,E") = PXjg(o, E") 


for all w E CN and all E' € G'. But then, by the Uniqueness Theorem 
5.5, the above equality holds for all c E GN and all £Z' C 3. J 


In particular, the c-algebra of Borel sets of a Polish space Q' = E is 
countably generated. If we choose a countable, dense subset D of E, then 
the system of all open balls in E with rational radii and centers in D is 
obviously a generator of $B(E). In this case we can now also prove the 
existence of a conditional distribution in general. The proof depends 
primarily on the familiar regularity of finite Borel measures on Polish 
spaces. 


10.3.5. Theorem. Let X: (2,2) — (E,8(E)) be a random variable 
on a probability space (Q,9(,P) with values in a Polish space E. Then for 
every o-algebra 3 C A there exists a conditional distribution Pxja. This 
is uniquely determined in the sense of (10.3.9). 
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Proof. First we show that the c-algebra 8 = 38(E) has a countable 
generator R which is an algebra in E. To this end, let G be a countable base. 
of E containing E, ©. the system of all sets CG with G € ©, and ©’ = 
(8 U ©.. Then G' is also countable; © contains Ø and E. The system 91 of 


n n 
all sets of the form \_) (-\ G; with Gi; € @, 45j21,...,mn€N, 
i-1 j=1 
yields the desired result. We only have to observe that by the set-theoretic 
distributive laws, all sets/ 1 UJ H; with H5; € W Gj =1,..., n; 


i=l j=l 
n EN) also lie in R. Suppose R = {R;: i € I] is an enumeration of R. 
Here we let J denote the set N of natural numbers or a finite segment 
112210 GORN: 

The distribution u = Px of X is regular by Theorem 7.3.3. Therefore, 
for each 7 € I there is an isotone sequence (K;;);~1,2,... of compact subsets 
of R; such that 


u(R;) = sup y(Ka). 
JEN 


If R* denotes the algebra generated in E by R and all Ky, 4 € I, j EN, 
then this is again countable. To see this, we repeat the above reasoning 
with R UYU (K5: i € I, j € N} instead of ©. After these preliminaries, for 
every set B € $8 we determine a version P3(X~1(B)) of the conditional 
probability of X—1(B) given 3 and set 

P(w,B) = P5(X-(B)(co) (oB) EQ X &. 


Now, by (10.1.25), 
k k 
PONE = > P(w,B,),  P-almost surely 

pel i=l 
on Q for every finite family of pairwise disjoint sets Bj, . . . , B, € 88. 
Because $t* is countable, there are only countably many such finite 
families Bi, . . . , B, of sets in 9t*. Hence there is a P-null set No such 
that B — P(w,B) is finitely additive on 9t* for every w € CN. Due to 
the 3-measurability of w — P(w,B), we can choose No in 3. For each 
i € I, the isotone sequence (1x,,);-1,2,... converges to 15, u-almost surely, 
and thus the sequence (1x-1x,,))j-1,2,... converges to lx-(g; P-almost 
surely. By (10.1.8) and (10.1.13) we then have 


sup P5(X-1(K;)) = sup E?(1x-x;) = E9(1xaq5) 
JEN JEN 
PS(X-(R,), P-almost surely. 
Thus for every 7 € I there is a P-null set N; € 3 such that 
2n P(o,Ki) = Plo R), foralloc UON, 
1S 
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Finally, by (10.1.23) there is another P-null set N,, © 3 such that 


P(E) =1,  foralec LN.. 


But then N = NoU U N: U N, is a P-null set in 3 such that B > 
i€r 

P(w,B) is a content on 9t* with P(o,E) = Vfor allw € CN. The restric- 
tion of this content to R is, as we shall show, @-continuous and hence a 
premeasure on R. For this, let (B,) be a sequence in R with B, | Ø and 
let w € Ç N. Since w does not lie in any N; (i € I), then for each e > 0 
and n = 1,2, . . . , there is a compact set K, (namely, a suitable K;;) 
such that K, C B, and 


BOBI EROK) IRo BN K,)' s 27e 


Since B, | Ø, we have( | K, = @. The compactness of all K, then 
n=1 
yields the existence of an no € N with KiM - - - O Kn, = Ø, that is, 


no 


with B,, C J (B; \ K;). But since B — P(w,B) is a finite content even 
on Rat A that 


RoB Y Pp Eds > Qe =. 


t=1 i=1 
Therefore inf P(w,B,) = 0, as asserted. 
nEN 


The premeasure B — P(w,B) on 9t can now be extended by Theorem 
1.5.7 (in exactly one way) to a probability measure B > Q(w,B) on B 
(w E CN). We also define Q(w,B) for elements w € N by arbitrarily 
choosing (in the case N # Ø) an wy € N and setting 


QB)-QosB)- |y exes C69 


Thus, for every o € N, the measure B — Q(w,B) is equal to ex(,,». There- 
fore, Q has the property (10.3.2) of a Markov kernel from (9,3) to (47,38), 
that is, B > Q(w,B) is a probability measure on $8 for all œ € Q. But the 
function w — Q(w,B) is also 3-measurable for every B € $5. By the above 
construction, since N € 3, this holds at least for all B € R. The system 
D of all D € for which w > Q(w,D) is 3-measurable is obviously a 
Dynkin system. Since R C D and because of the -stability of R, 8 = 
A(R) = D(R) C D C BY by Theorem 1.2.3, that is, D = B. Thus, Q is a 
Markov kernel from (2,3) to (E,93). Q also has the last property of a 
conditional distribution of X given 3: According to the construction of Q, 
for every B € R, w — P(w,B) and hence e — Q(w,B) is a version of 
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P9(X-(B)), and thus 
f, Uo,B)P(dw) = PIZ XB) 


for all B € R and Z E 3. For every Z E 3, B > f zQ(w,B) P(dw) and 
B — P(Z (4 X-(B)) are finite measures on Y which coincide on R. Then, 
by the Uniqueness Theorem, the two measures coincide on $8. The last 
equality thus also holds for all B € $8 and 3 € 3. Therefore o — Q(w,B) 
is a version of P9(X-(B)) for all B € $. As asserted, Pxia = Q yields 
the desired result. The uniqueness follows from Theorem 10.3.4 as 
mentioned. J 


In order to demonstrate the usefulness of conditional distributions, we 
now prove the so-called Jensen inequality for ordinary and conditional 
expectations. 

First we recall the following well-known notion: A real function q 
defined on an interval J C R is said to be concave if 


age) (= E ee ar = 


for any two points z, y € J and all numbers « € [0,1]. Using induction, 
we can prove easily that this is equivalent to the following requirement: 
For any finite number of points xı, . . . , x4 € J and numbers ai, . . 


MER meo a = 


D agri) Sq (Y aizi). 


i= i21 


x? 


We say that q is convex if —q is concave. T 


10.3.6. Theorem. Let X be a real, integrable random variable on a 
probability space (Q,9(,P) with values in an arbitrary interval J C R. 
Then E(X) lies in J, and for every continuous concave function q on J, 


E(q° X) < q(E(X)), (10.3.10) 


provided qo X is integrable. 


Proof. 1. E(X) lies in J. This follows from the following two remarks: 
(i) For every real number a, the relation X < a (or a € X) implies 
E(X) S a (or a € E(X)); (ii) X(w) € a (or a < X(w)) for all w E Q 
implies E(X) < a (ora < E(X)). Remark (i) is an immediate consequence 
of the isotonicity of integrals. From X(w) « a for all w € Q, we conclude 
that E(X) = a is impossible, and hence E(X) € a. Indeed, E(X) = a 
would imply E(« — X) = 0; hence, the integrand a — X = 0 would 
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vanish P-almost surely. But (X < a} = Q has probability 1. The same 
argument proves the second part of (ii). 


2. Next we prove (10.3.10) for the case of a compact interval J. We 
consider X as a random variable with values in the compact subspace J 
of R and correspondingly consider the distribution Px as a probability 
measure on J. By Corollary 7.7.4 and Theorem 7.8.4, there exists a 
sequence (u,) of discrete probability measures on J which converges 
vaguely and hence weakly to Px. Consequently, we have, on one hand, 


(a) lim [sus (da) = fa QP i= EX) 


and on the other 


(b) lim fqdu, = fqdPx = E(qo X). 


Every measure un is of the form Zi,ai€a,, where ai,..., a4 E J, a, 

. , a € R,, and Za; = 1. (Here all the a; and a; depend on both k 
and n.) Therefore, m, = [us (da) = Di, aa; C J since J is a convex 
set. From this and (a) it follows that E(X) € J because J is closed. 


Moreover, 
[ adm Yon = a (X aa) = a 


due to the concavity of q. From lim m, = E(X), the continuity of q, and 
from (b), it follows that E(g» X) € q(E(X)), that is, we have the Jensen 
inequality. 


3. The general case can now be derived from 2: There exists an isotone 
sequence (Jn)nen of compact intervals in R such that J, TJ. Then 
Q, = (X € In} defines a sequence in the c-algebra A such that ©, 7 Q. 
Thus lim P(Q,) = 1, and we can assume P(Q,) > 0 for all n € N. Since 
the restriction P, of [1/P(Q,)]P to Q, / X is a probability measure and 
since the continuous function q is bounded on the compact interval In, 
the restriction of go X to Q, is an integrable random variable on the 
probability space (Qn, Qn O A, Pa). Thus we know from 2 


1 1 
PQ,) js o ep i il s aP). 


Now lim P(Qa) = 1, lim fg,q9 X dP = E(q° X), and lim fa, X dP = 
E(X). Since q is continuous on J, we obtain (10.3.10) as n > œ. J 
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10.3.7. Corollary. If X and q are as above, then for every o-sub- 
algebra 3 of 9f: 


E9(X) EJ, P-almost surely; (10.3.11) 
E9(qo X) € q(E9(X)), | P-almost surely. (10.3.12) 


Proof. By Theorem 10.3.5 there exists a conditional distribution Pxı3, 
considered as a kernel from (0,3) to (J,985(J)). Let Q. denote the proba- 
bility measure A — Px3(w,A) on B(J) (w € Q). Then by (10.3.8) we have 


[£3(X)](w) = f zQ. (dax), P-almost surely 
and also 
[E9(g o X)](w) = fa(x)Q. (da), P-almost surely. 


Since (J,8(J),Q,) is a probability space for every o € Q, and x — z is 
a random variable on it, then by Theorem 10.3.6 


[«Q. (da) EJ 
and f[a(z)Q. (dz) < a [2Q.(dz)), P-almost surely. 


But these four results imply (10.3.11) and (10.3.12). J 


Remark. The function x —> |x| is convex on R. Therefore, we again 
obtain the familiar inequalities |E(X)| € E(|X|) and |E9(X) | < E9(|X|) 
almost surely. 


PROBLEMS 


1. Let X:(0,90) — (E,38(E)) and Y: (Q,90) — (9,9) be random variables 
on a probability space (Q,90(, P) with values in a Polish space E or an 
arbitrary measurable space (Q",9[^), respectively. Imitate the proof of 
Theorem 10.3.5 and prove the existence of a kernel Q from (Q,9[^) to 
(E,:8(E)) such that y > Q(y,B) is a version of y > P(X—1(B) | Y = y) 
for all B € B(E). For each y E 9', the probability measure 
B — Q(y,B) on 38(E) is then called a conditional distribution of X 
gwen that Y equals y. It is denoted by Pxjv.,; hence, for a given 
version of y > P(X-(B) | Y = y), we have 


Pyyz$B) = PX (Bb) | Y my) Py-almost surely. 


Why is then (w,B) — Pxjy-yv(w)(B) a conditional distribution Px,y? 

2. Consider the situation of Problem 1 and denote by yu, the probability 
measure Px|v-,. Prove that the joint distribution of Pxgy of X and Y 
is given by the formula 3 l 
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Pxey(M) = ftflu(y)u(dz)Pr(dy, | (M € $(E) e W). 


Observe that y — { 1u(x,y)u, (dx) is 9-measurable. Express f fdPxey 
for nonnegative measurable functions and Pxgy-integrable functions 
f on E X © in terms of m and Py. 

Consider again the situation of Problem 1. Prove: X and Y are 
independent if and only if 


Px, = Px, Py-almost surely. 


Let X and Y be real random variables on (0,%,P). Assume that X has 
voor aS distribution and that the conditional distribution Px|y equals 
vy». Prove that the joint distribution of X and Y has the following 
density with respect to \?: 


_1(2 99^). 
(x,y) nce Hats 
2moT 
Let X;,... , X, be n real random variables on (Q,9(,P) such that 
their joint distribution has density d with respect to A". Introduce 
X= XO: 9 X,and Y= X,409 --- @ X, for some p € 
(1, . .. n — 1}, and denote by dy the function 
(te pea) 9) — fadla, a SAC UAT ERE 505) 


on R*-7. Prove: 
(a) dyA"-? is a probability measure on 38777, 
(b) dy = 0, P y-almost surely. 


(c) For Py-almost all y = (z511, . . . 24) € R2 we have 
Privey = dy)?, 
where 
d(x1, E, jen) 
d so te) zm e eta 
2 2») dy(Zpy1, a . 2») 


Deduce from Minkowski's inequality (2.6.4) that x — |x|? is convex on 
R for all p = 1. Prove that for every random variable X € £7(P), 
defined on a probability space (Q,9(,P), and every o-subalgebra 3 of 
Y the inequality 


|ES(X)|» < E9(|X|») 
holds almost surely. 


11 


MARTINGALES 


The theory of martingales established by J. L. Doob (see [11] and [39]) 
is one of the principal tools of the theory of stochastic processes. It pro- 
vides a unified method for dealing with various limit theorems of proba- 
bility theory. In particular, the concept of martingale helps us to study 
theorems connected with the law of large numbers in a new light. We are 
more or less inevitably led to the concept of martingale through the study 
of the convergence behavior of conditional expectations E%™(X) relative 
to an isotone sequence of c-subalgebras A, of 9f. 


11.1 DEFINITION AND EXAMPLES 


Suppose we are given a probability space (Q,9(, P), a partially ordered 
set I,! a family (9);ei of o-subalgebras A, of A, and a family (Xj)igr of 
random variables X,: (2,20) — (Q',9) with values in a measurable space 
(Q^,9().? We agree on the following terminology: The family (W):er is 
said to be isotone (or increasing) if 


Sst 1 EN, (s, t € I). aa 


The family (X;)ier is said to be adapted to the family (9,),e if X, is A-W- 
measurable for every t € I. 


1 Thus a relation < is defined in J which is reflexive (t < t for all t € J), antisym- 
metric (s < t,t Ss =s = t), and transitive (s <t, t Suss S w).]íszstortss 
holds for every pair (s,t) € I X I, then J is said to be totally ordered. 

2 We denote the families by (9(/)) and (X;) when there is no danger of ambiguity. 
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Example 
1. Let (Q,2,P), I and (X))ier be given as above. For every t € I, let 
UE SCIT CE C EN 


be the c-algebra generated by all X, with s € t (s € I). Then the family 
(3(7).er is obviously isotone and (X,) is adapted to the family (X°). The 
family (X,):er is obviously adapted to a given isotone family (2,):er of 
c-subalgebras of X if and only if 


JE SI for all t € I. 


11.1.1. Definition. Let (Q,9(,P) be a probability space, (.),cr an iso- 
tone family of o-subalgebras of Y, and (X,):cr a family of real, integrable 
random variables adapted to (?(),c;. We call (Xj) a super-martingale 
[relative to the family (?()] if one of the following equivalent conditions 
is satisfied for all pairs s, t € I with s < t: 


ET OO P-almost surely ; (1391.25 
iP X,dP < ja X,dP, forall A C9. (11.1.3) 


We call (X;) a sub-martingale [relative to (9()] if (— Xj) is a super- 
martingale relative to (9(). If (X;) is both a super-martingale and a sub- 
martingale, then (X;) is called a martingale.’ 


Hence, in the definition of sub-martingale (martingale), the symbols 
< in (11.1.2) and (11.1.3) should be replaced by = (=). The following 
remarks serve to clarify the definition further. 

Remarks. 1. The equivalence of conditions (11.1.2) and (11.1.3) is 
easy to see: For all ACY, f 4E"(X) dP = f4X.dP. Thus (11.1.3) 


follows from (11.1.2). Conversely, (11.1.3) implies 
ih (X, — E*(X)) dP Z0,  forall A € XC. 


Here the integrand is ?(,- measurable. Therefore, we can choose 
A = {X, —E'GXx)«90k 


The above inequality then shows that P(X, — E™(X,) < 0} = 0, and 
thus (11.1.2) holds. 


2. Example 1 tells us that a family (X;),er of real, integrable random 
variables is always adapted to the isotone family (3(7),e; of the o-algebras 


3 The strong interrelations of the random variables X, of a martingale explain the 
use of the term ‘‘martingale.” Generally, martingale denotes a part of the bridle of a 
horse, namely, that part which prevents the horse from tossing his head high. 
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X? = 9((X,; s € t). Therefore we simply call (X,) a super-martingale, 
and so on, that is, without reference to a family (X), if we are dealing with 
a super-martingale, and so on, relative to (Xf). Every super-martingale 
(martingale) relative to (X) is obviously also a super-martingale (martin- 


gale) relative to (9(2). 


3. Properties (11.1.2) and (11.1.3) are obvious (in fact, with equality) in 
the case s = t. In verifying these properties we can therefore always 
assume that s < t, that is, s Stands ¥t. 


4. If (Xn)nen is a super-martingale relative to a sequence (90,),ew of 
c-algebras (with the set N of natural numbers, ordered as usual, as index 
set), then properties (11.1.2) and (11.1.3) follow from the validity of 


EUN xU YS almost surely, 
for all n € N: By (10.1.18), 
EU(X,) EUM) = BM (BMH... Eb, 
holds almost surely for every p = 2,3, . . . , and therefore 


PENES almost surely 


follows from the inequality above. (Corresponding results hold for index 
sete Too e FEL G NS) 


5. ForA =Q, (11.1.3) yields the assertion that for every super-martingale 
(X )igr, the expectations E(X.) depend antitonely on t: 


Ses F(X) E LEX). 
For a martingale, the expectations E(X.) are independent of t. 


Examples 


2. Let (Q0,9(4P) be a probability space, (:),cr an isotone family of 
o-subalgebras of X, and X a real, integrable random variable on Q. We 
define X, = E™(X) for every t € I. Then (X,) is a martingale relative to 
(9(,). Indeed, every X,is ?(, measurable by definition. From s € tit follows 
that E'(X) = E'(E* (X) = E*(X) = X, almost surely according to 
the smoothing property (10.1.18). 


3. Let (Xn)nen be an independent sequence of real, integrable random 
variables on à probability space. Assume each X, is centered and thus 
E(X,) = 0 for all n EN. If S, = Z7., Xi denotes the nth partial sum * 
of the sequence (Xn), then (Sn)ney is a martingale. To see this: For every 
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natural number n, according to Remark 4, we need to verify the equality 


EGS ecc qa Sax a saa) aes almost surely. 
Now obviously 9((Xi, . . . ,X,) = 9((Sy, . . . ,S,). Therefore, by Corol- 
lary 5.1.5, X,,1is independent of (Sı, . . . ,Sn) and hence 
ENG alae a gn) SCX em = 0, almost surely. 
However, for all; CN with 1 Sj € n we have E(Xj| Sy .. . Sa) 


= X; almost surely. The assertion follows when we add these equalities. 
The proof further shows the following: If we replace the conditions 
E(X,) = 0 by E(X,) € 0(20) for all n, then (Sn) is a super-(sub-) 
martingale. 


4. Martingales allow the following interpretation: Let (Xn)nen be a 
sequence of real, integrable random variables on a probability space. 
Suppose the variable X, is interpreted as the amount a player wins on 
the nth play. Then 

Sn = Xı + POY + X, 


is the fortune of the player after the first n plays. For obvious reasons 
we call the game fair if 


E(Xi = 0 and E(Xn41|X1, ... ,Xn) = 0, almost surely (n € N). 


The same reasoning as in Example 3 then shows that (S,)nen 1s a martin- 
gale. On the other hand, if we had 


E(X3:2 0: and BX Xr > 2 X5) z.0,- almost;surely mE N); 


then our player would have an advantage over his opponent. (Sn) would 
then be a sub-martingale. 


5. For a probability space (Q,9(,P), let I denote the set of all (finite) 
decompositions t = (B;)j=1,...,», of € into finitely many pairwise disjoint 
sets B; C A. I is partially ordered as follows: The relationship ¢’ € t” 
holds between ť' = (B;) and t” = (B,’) if and only if t” is finer than '', 
that is, if every set B; is the union of certain sets B;’. For every decom- 
position ¢ = (B) € I, let ?(, denote the c-algebra in % generated by all 


By . . . , Bn, (consisting of finitely many sets) (see Section 10.1, p. 302). 
In addition to P, suppose a finite content Q is defined on X. If we set 
ON Q(B) 
dex vp jt 
1-1 


for every s = (B;) € I, where the quotient Q(B,)/P(B;) is set equal to 
zero when P(B,) = 0, then (Xjigr is a super-martingale relative to 
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(A)er. First it is clear that the family (X) is isotone and (X;) is adapted 
to this family. If s = (B;) and t = (Cj) are decompositions from J such 


that s € t, then every B; is the union of certain C, x — 1, .. . , kh, 
and hence 
Q(B;) | Q(B), if P(B:) > 0, 
¿dP = P(B;) = ; 
T: 3 P(B) ( ) 0, if P(B;) = 0; 
k 
Q(C;,.) > 
A 0P = x- P(C,) = (5 
JET ) Pe) OH) = 2, Wr) 
The symbol 2 here denotes summation over all x = 1, . . . , k with 
P(C,,) > 0. Therefore we have f/5,X, dP € Q(B;) and fz, X, dP = 0 when 
P(B,) = 0. We then obtain f5,X.dP € f5,X.dP fori =1,..., m. 


Since every element from X, is the union of finitely many B;, this proves 
the inequality E*(X,) < X, almost surely. 

The above proof tells us that (X;) is a martingale relative to (9(,) if 
P(A) = 0 (A € X) implies Q(A) = 0, that is, if (in the sense of Defini- 
tion 2.9.5) the content Q is P-continuous. 


The following properties of super-martingales make it possible to con- 
struct further examples. Here we let (X;),g; and (Y;)ig; be two super- 
martingales or martingales on a probability space (Q,90, P) relative to the 
same isotone family (l:):er of o-subalgebras of X. 


When (X,) and (Y) are super-martingales (or martingales), 
(o. X, + BY,) is also a super-martingale (or martingale) (11.1.4) 
relative to (9G) (o, 8 E R, or o, 8 € R, respectively). 


When (X,) and (Y) are super-martingales, (inf (X, Y.)) is 


also a super-martingale. CHIESE) 


For every super-martingale (X;), (X7) is a sub-martingale. (11.1.6) 


If (X)) is a super-martingale (martingale) with values in an 

interval J C R [that is, X,(0) C J for allt € I] and q: J —> R 

is a continuous, isotone, concave (continuous concave) Cea) 
function, then (qo X;) is a super-martingale relative to (,) 

provided all random variables qo X, are integrable. 


(11.1.4) follows directly from (11.1.2). For (11.1.5) we proceed as 
follows: For any two elements s, t € I with s € t we have E™(X,) < X, 
and E™(Y,) € Y, almost surely. Then E™(inf(X,,Y,)) € E*(X) € X, 
and also E"(nf(X,Y)) x E"(Y) x Y, almost surely; therefore, 
E'"Gnf(X,Y)) € inf(X. Y.) almost surely. If we choose all Y, in 
(11.1.5) equal to 0, then (11.1.6) follows since X; = —inf (X,,0). Property 
(11.1.7) is a direct consequence of the Jensen inequality (10.3.12). 
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As an example, x — x”, x — sup (2,0), x > sup (—2,0), and z —> |z| are 
continuous convex functions on R. Therefore, for every martingale 
(X,)rer, it is seen that (X7), (X7), (X7), and (|X|) are sub-martingales. 
Of course we must require square integrability of all X, in the first case. 


Remark. 6. Sometimes super-martingales are defined by the require- 
ment (11.1.2) if the random variables X, are no longer integrable but all 
X. 2 0 or have integrable negative part X;. We do not go into the details 
of this obvious generalization. 


PROBLEMS 


1. Consider Polya’s urn model of Section 4.2, Problem 6. Let Xo = 
b/(b + w) and let X, be the proportion of black balls attained in the 
nth drawing. Prove that (Xn)n=0,1,... is a martingale. 

2. Let (Xn)nen be a martingale relative to an isotone sequence (Ln) nen 
of o-subalgebras of Y where (Q,9(,P) is the underlying probability 
space. Let (fn)nen be a sequence of bounded real random variables 
adapted to the sequence (Mn). Define by induction 


Y; 
YR 


X; 
Y, fn (Xn41— Xn) = (n€N). 


Prove that (Y,)nen is a martingale relative to (Wn)nen- 

3. Let ((X?):er) nen be a sequence of super-martingales on a probability 
space (Q,9(,P) with respect to one isotone family (9[),g; of o-sub- 
algebras. Assume that |X7| < Y, holds almost surely for all n € N 
and t € I, where each Y,is an integrable random variable. Prove that 
(sup X7):er resp. (inf X7):er is a super-martingale provided that the 
nEN nEN 


sequence (X7)nen is isotone resp. antitone for each t E I. 
4. Let (X,)rer be a sub-martingale on (Q,2,P) of random variables X, 2 0 
in £°(P),1 € p € +. Prove that (X?),cr is again a sub-martingale. 
5. Let (X)ier be a martingale of random variables X, € £(P), 1 € 
p < +. Prove that (|X,|?):cr is a sub-martingale. 


11.2 TRANSFORMATION BY STOPPING TIMES 


Example 4 of the preceding section tells us that we can interpret a 
martingale (X,),ew as a fair game, where X, signifies the fortune of the 
player at time n. The player can now decide to terminate the game at 
some random point of time T' (depending possibly on his mood or on the 
course of the game up to then). Thus T is a random variable with values 


332 FURTHER DEVELOPMENT OF PROBABILITY THEORY 


in N. The run of total wins by the player is then represented by the 
sequence (X7),cy of random variables 
x) = Xn(w), e € (T 2 nj. 
K Xrw)(w), e€ (IT <n} 
However, the player may also want to test the course of the game by 
sampling his fortune at certain times T; € T» € - - - (all in N) which 
appear favorable to him and which are again random variables. He may 
ask himself whether the fair course of the game can be changed to his 
advantage by passing to (X7,). Thus he will ask whether a sub-martingale 
(Xr,) which is not a martingale can be obtained from the martingale (X7). 
The second construction is a generalization of the first: We need only set 
T, = inf (T,n) to obtain X*(w) = Xr% (w) for all n € N and w € Q.* 
Since our player is not supposed to have the gift of prophesy, the 
random points of time T or T, contain “no anticipation of the future." 
The meaning of this will be made precise immediately in the concept of a 
stopping time. With the help of this clarification, we shall be able to show 
the invariance of the martingale property in the transition from (X,) to 
(Xr,) and thus also to (X7). For the sake of simplicity we shall restrict 
consideration to the case in which the index setis [1, . . . ,k} instead of N. 


11.2.1. Definition. Let (2.):cr be an isotone family of c-algebras in a 
set Q. A stopping time (relative to this family) is a mapping T: Q — I 
of 2 into the partially ordered index set J such that 


(Tx1i)€35. foralltc I (19:91) 


If I is an interval in R or a subset of the natural numbers, then (11.2.1) 
implies the measurability of T relative to every o-algebra X in Q with 


A D V N.. Condition (11.2.1) ean be given the interpretation that the 
(CI 


random variable T' does not contain any knowledge of the future. Here 
the o-algebra A, embodies “random events up to time t.” Therefore, it is 
not surprising that one generally determines a stopping time T to be the 
point of time at which some given random phenomenon is observed for 
the first time. Example 4 below and the stopping times to be constructed 
in the proof of Theorem 11.3.2 illuminate this observation. 


* The transition from (X,) to (X7) [(X7r,)] is called optional stopping [optional 
sampling]. 

5 {T S t] of course means the set of all o E Q with T(w) < t. We are often forced 
(see Example 4) to adjoin an element + © to J which, by definition, must satisfy the 
relationshipst € + © forallt € J. Then a mapping T: Q— I U {+ » } which satisfies. 
(11.2.1) is still called a stopping time. But this is notà real generalization. We need 
only set Ao = 3((9; t € I) to obtain a stopping time relative to (Q)er,(4«) in the 
sense of our definition. 
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Examples 


1. Every constant mapping 7: 2 — I is a stopping time relative to an 
arbitrary isotone family (l:):er of c-algebras in Q. If t; is the constant 
value of T, then 


pegal if to St 


e, otherwise (t € D. 

2. If I is totally ordered, then when S and T are stopping times, 
inf(S,T) and sup(S,T) are also stopping times (relative to the same 
amily (W:):cr). Indeed, we have 


{inf (ST) St} = {Sst} U (T & t 


and 
{sup (S,T) S tj 


l 
[2 
|^ 
E 
8 
IA 


3. Inthe case 7 — N, when S and T are stopping times, S + T is also a 
stopping time. This follows from 


FEES UNE E HONONVUESIS 
ur 


4. Let I - NU {+} and let (Xj)ier be a family of real random 
variables on a probability space adapted to the family (M:):cr; thus, in 
particular, we have A, C Y for allt € I. Fora set A € 38!, let T, denote 
the first point of time at which X, lies in A, that is, for each w € Q let 
inf ft € I: Xw) € A}, if such ¢ exist 


T4(w) = +0, otherwise. 


Then we call T4 the first entry time of A. This is a stopping time since 


t 


(T4.S 4] - MAJ (X. c A} 


T-l 


for every t € N and (T4 € t} = Q fort = +o. 
If a mapping T: 2 I takes on only countably many values St for 
every t € I, then it is a stopping time if and only if 
(iT 2t) €f, for all t € I. (11.2.2) 
This follows from the equalities 
{T=t}={Tst}\UIT ss} 


and 
M 


IA 
I 
8 
l 
2 
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if we note that by hypothesis the right-hand sides above are the unions 
of countably many sets from Y%,. 

We associate with every stopping time T relative to (:):er a o-algebra 
Xr by setting 


Xr = (Ec SQ): E^ (T st} CA forralt c Ij. (11.2.3) 


It is clear that Xr contains the.set 2, and with every sequence of sets, it 
also contains their union. But when a set E lies in Ur, also CE lies in Yr; 
since we have 


Q\ENAT st} ={Tsh\EN{(Tst} C, 


for allt € I. Thus 9(7 is a o-algebra. It is called the c-algebra of events up 
to time T. If T is a constant to (see Example 1), then obviously Ar = W. 


11.2.2. Lemma. If T and 7% are stopping times relative to an iso- 
tone family (3),e; of o-subalgebras of a o-algebra A in Q, then the 
following statements hold: 


If I contains a countable subset D such that for each t € J there exists 
a d € D with t < d, then 
Ar C 9f. C25) 


Proof. For every o € 9, let T(w) € T*(w), that is, (T* x t] C 
(Ts t] for each, t€'I. Hence; KC VIT sd) =F AAT SUM 
(T* < t} for every set E CQ and every t € I. Therefore, E € Ar 
implies E € Ar». Under the condition given in (11.2.5), we have 

9-V/(Tszd)- 


d€D 
and thus 


E —BECYQ NJ ELQTVUTSd. 
aED 
for every set E C Q. Since D is countable, it now follows that E € Y for 
every E E Ar. J 


Now let (X))igr be a family of random variables on a probability space 
(Q,9(, P) with values in an arbitrary measurable space (Q/,9(^) ; assume that 
(X)ier is adapted to an isotone family (9().er of o-subalgebras. With 
every stopping time T relative to (W) we can then associate a mapping 
Xr: Q — Q' by defining 

X7(w) = X rw) (w).® (11.2.6) 


Under additional hypotheses X7 is then a random variable. We show: 


* If T is constant, equal to to then Xr = Xx. 
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11.2.3. Lemma. Xr is ?(;-measurable if the set of all values ¢’ of T 
with ¢’ € tis countable for every t € I. 


Proof. We must verify that E = {Xr € A'] is an element of Ñr for 
every set A’ C W. But for every t € I, 


/ 


Hr DEED UU RV Jtr. car Yum eg. 
ver vel 
st Ust 


Hence it follows that E O (T < t} € 3(, and thus E E Ap from (11.2.3) 


and the countability hypothesis on T. For this hypothesis implies that at 
most countably many of the sets (T = t'] with t € t are nonempty. J 


If we restrict consideration to super-martingales with finite, totally 
ordered index set I = (1, . . . ,k}, we can now answer the question raised 
in the introduction (concerning optional sampling). 


11.2.4. Theorem. Let (Xj)... be a super-martingale (martin- 
gale) relative to an isotone family (96);/.1,...,. of e-algebras and let 
(T;);=1,...,» be an isotone family of stopping times relative to (9(;). Then 
(Xe Wen p is also a super-martingale (martingale) relative to 


(915) ;—i TI D: 


Proof. By Lemma 11.2.2, (?[7;) is an isotone family of o-subalgebras 
of the c-algebra X of the underlying probability space (0,9(, P). By Lemma 
11.2.3, (X7) is adapted to the family (%7,). Each of the random variables 
Xr, is integrable since 


E(Xz) = D ficare |Xr,| dP x y E(X4) < o. 
i=l 


1-1 


Therefore we need only show the inequality for super-martingales, 
namely, 


f, Xr, dP s [, Xv, aP 


for every j € (1, ..., p — 1} and all A E Ar, For this, we set S = T; 
and T = T;,, Then S and T are stopping times such that S € T. We 
need to show that 


i X;dP < fe Xs dP, 


for all A € As. 
We first treat the special case T — S € 1. Then for every A € As, 


= k—1 
J, Xs — Xr) dP = $ [,, Xs — Xr) dP = 2 f, Q8 Xm) ae, 
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when we introduce A= ANIS =I O{T>S}=AN {8S — 4$) P 


Sa AOU S ei T= sb veto iat), Hence 
the asserted inequality follows from the inequalities for the super-martin- 
gale (X;) provided A; € A; for all i = 1, .. . , k — 1. Now {T > i} = 


C {T < i} lies in Y;. From A E As it follows that 
ACVIS == ANNS S ANNANS E ex 
j=1 


Therefore, indeed, A; € i. 

We can now proceed as follows in the general case: For every ? = 1, 
...,k, Ri = inf (T, S + 2) is a stopping time according to Example 2 
We have S S RS #25 :°* S Ree!) Ri 8 2, and Aur— 
R; < 1, forallt = 1, ...,k — 1. From the special case treated above, 
since As C Ar, C - + - C Ar, it now follows that for every A E Ys, 


f, XraP = f, Xr, dP < TE Xs,dP < f, Xs dP, 


and thus we have the desired inequality. 
If (X) is a martingale, then (X;) and (— X;), and hence also (X7,) and 
(— Xr,) are super-martingales. But then (X7,) is a martingale. —.| 


11.2.5. Corollary. If (Xj.Li... is a super-martingale with 


= {1, ... ,k} as index set and T is a stopping time [relative to 
(90)... .....], then the following inequalities hold: 


E(X) = E(Xr) = E(X,), (11.2.7) 
E(|Xrl) € E(X) + 25E(X;). (11.2.8) 


Proof. We apply (11.2.4) to the isotone family of stopping times 
Tı = 1, T2 = T, T; = k. Then (X7,);~1,2,3 18 a super-martingale and hence 


E(Xr,) = E(Xr,) 2 E(Xz). 


But this is inequality (11.2.7). 

By (11.2.7), E(Xr) € E(X). Since (—X;)i...4 is also a super- 
martingale, the inequality E(X7) € E(X;) follows again from (11.2.7). 
Since |X7| = X} + X; = Xr + 2X7, we now obtain the asserted 
inequality (11.2.8). J 


Remark. If (Xj)i-i...; is a martingale, then (11.2.7) becomes the 
equality E(X1) = E(Xr) = E(X,). As for our introductory example, this» 
tells us that the expected gain remains unchariged by optional stopping. 
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PROBLEMS 


1. Let (3[)),er be an isotone family of c-algebras in a set Q with index set 
I = [0,-- »[, and let S and T be stopping times with respect to 
(9) ,er. Prove: 

(a) T is 3(5-measurable. 
(b) Each of the sets (S < T), (S € T], (S = T] is in 915 Y Ar. 

2. Let (M)ıcr be an isotone family of c-algebras in a set Q with index 
set I = [0, -- »o[. Define A} = ( )\ A for each t € I. Prove: 


t«s 
(a) (%l*):er is an isotone family of o-algebras in Q. 


(b) A mapping T: 2 — I is a stopping time with respect to (Qt) if 
and only if (T < t} € M, for allt E I. 

(c) Every stopping time with respect to (9[;) is a stopping time with 
respect to (W) (but not conversely in general). 


Dasulseb: Cosas ak be a super-martingale, and ^ € R,. Introduce the 


random variables X = sup X;and X = inf Xj, and prove: 
i=l... 4k i=1,...,k 


(a) AP{X 2d} s E(X) — E(X;). 
(b AP{X = —A} Ss E(X;). 


[Hint: To obtain (a), prove that 


smallest 7 € (1, . . . ,k} satisfying X,;(w) Z A 


ae ku Xn) < XN forall? @ jb »o | 


is a stopping time, and apply (11.2.7).] 
4. Let (Xj;.i.... be a martingale of square-integrable random varia- 
bles. Deduce from Problem 3 


1 
PX sup [Xd xs EO 
k 


for \ > 0. Deduce from this Kolmogorov's inequality (6.3.12). 


11.3 THE DOOB INEQUALITIES 


Similar to the way in which the Hájek-Rényi inequality plays the 
decisive role in the proof of Kolmogorov's theorem on the strong law of 
large numbers, the convergence theorems of the next section are based on 
two inequalities of J. L. Doob. They are built upon the following com- 
binatorial concepts. 


11.3.1. Definition. Let (z,):-1,...,, be a finite sequence of real num- 
bers and let [a,b] be a compact interval in R with a « 6. Then the integer 
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Uta b), called the number of downcrossings of [a,b] by (a:)i=1,...,%, 18 defined 
as follows: If there are indices 7,7 € (1, . . . ,k} with ? <j and z; Z b, 
z; € a, then let Uta»; be the largest natural number / for which there 


exist indices 7; < : : * « iz from (1, . . . ,k} satisfying 


Gin ELO AOR E Tor all X = A cer GERE) 


If there are no such indices 7, 7, then let Uta.) = 0. 

The number of downcrossings of [ —5, —a] by the sequence (—2;)i=1,...,x 
is denoted by Uta» and is called the number of upcrossings of [a,b] by 
yin mes oe 


For a given sequence (z;)i-i,..., x, we obviously determine Ug: by the 
following procedure: Let ?; be the smallest index with z; 2 b; tə the 
next largest index with rz; < a; ?; the next largest index with Tub. 
and so on. This process terminates after p steps. Then Uta») = [p/2] 
where [p/2] denotes the greatest integer € p/2. In particular, Uta») € 
[k/2]. To determine Ua), we modify the procedure in the obvious way. 
Both procedures will be used in the proof of Theorem 11.3.2. 


If we consider, say, the sequence 
0710/07 1170, 0707 TL 1830.00 07173218 
corresponding to k = 20, then Ujo, = 3 and Ujo,y = 4. 


Now let (X;):1,...,% be a finite sequence of real random variables on a 
probability space (Q,9(, P). For every w € Q we then obtain the numbers 
Ut5(v) and Ü(j54(w) of down- and upcrossings respectively of the 
compact interval [a,b] by the sequence (X;(«»));—1,... 4. Then w — U tab (w) 
and e — Uta,»(w) are random variables. The measurability of Uj,;; (and 
thus also of Ūta») follows from the relationship 


l 
tra aa A e= aO YU Xe AVI S, 
SEAE TS E 
which holds for every natural number l € {1, . . . , [k/2]}. Here the 
union is to be taken over all (2l)-tuples of natural numbers (à, . . . ,@22) 


The promised inequalities now relate to the expected values? of these 
random variables Uj») and Uta,» for the case of a super-martingale (X;): 


11.3.2. Theorem (Doob's Inequalities). For every super-martin- 
gale (Xj;-1...,. with index set 7 = (1, . . . ,k] and every compact 
interval [a,b] with a < b, the expected values of Ọta] and Uta) satisfy 


7 As elementary random variables, Uta, and Ūta are of course integrable. 
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the inequalities 


1 
EUa) € 5-5 nt (X1,b) — inf (X,,5)], (1316912) 
E 1 
E(U ja») S ae a PA U (11.3.3) 
Proof. If we set X; = inf (X,b) for every t = 1, . . . , k, then (X7) 


is a super-martingale by (11.1.5). The number Ujae; (Üj,5) of down- 
crossings (upcrossings) of [a,b] obviously remains unchanged if we replace 
(X) by (X7). Now let p be an even natural number =k. By induction we 
define a sequence (T); o1,...,5 of stopping times relative to the family 
(2°) :-0,1,...,x Of o-algebras by setting, for every o € Q: 


To(w) = 1; 
p smallest? € (1, . . . ,k} 
T'xca(e) = with ? 2 T,,_,(w) and Xf lo) = b, 
k, if no such 7 exists; 
ni smallest ? E (1, . . . ,k} 
Ta (c) with ? 2 T, ,(w) and X¥(w) S a, 
k, if no such 7 exists. 
Here ^ runs through the values 1, 2, . . . , p/2 and we let A? = {ØQ}. 
We are dealing with stopping times, since for every îi = 1,...,k—1 
ando dS D2, 
(Toa = i 
UT SOLA en t VJ (Tas Sp =L), 
j<i 
{T = 1} 
= {Ty1 S21} O UTE Say ©) C J GT E (X^ e a}), 


j«i 


and furthermore, 
k 
(Toa = k} E t J (IT a s = jj CN (X; = b}), 
k 
{Ta =k} = C M (Tni sg {x} Sua 


Accordingly, all sets (T; = i} and thus also all sets (T; € 7} lie in W 
($20,1,... , k). By definition the sequence (7;)j~0,1,...,p 18 isotone. 
Therefore, Theorem 11.2.4 applies, according to which (X7,)j~0,1,...,» 18 a 
super-martingale and thus the sequence (H(X7,)) is antitone. The sequence 
(T;) is in fact isotone in a stronger sense: Since the conditions X¥ (w) € a 
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and X7(w) = b cannot be satisfied simultaneously, we have T;(w) < 
Tilo) forevery7 —0,1, ... , p with Tw) < k. Since p 2 k, we must 
have Tlw) = k for all o € Q. 

By the construction of the stopping times T;, we see that Ọta» (w) is 
now precisely the number of downcrossings of [a,b] by the sequence 
(X7, (9));21, ...,» and therefore, obviously, 


p/2 p/2—1 
(b—aUua S ) Gh. - XR) - XL — X, 9, Qaa 7 XE). 
A=1 A=1 


By integration we then have 
p/2—1 
(b— a)E(Uten) S EG) — E(XH) + 9, EQ) — EXP); 
j=l 
and thus, due to the antitonicity of the sequence (E(X7));-o1,....» and 
to T, = 1, T, = k, we have 


(b — a)E(U un) S E(X7,) — E(Xp,) = E(X1) — E(X;). 


But this is the inequality (11.3.2). 
To prove (11.3.3), we can proceed quite similarly. We choose p as above, 
but define (T;);=0,1,...,p inductively as follows: 


To(w) = 1; 
the smallest ? € (1, . . . ,k} 
Tarw) = with 7 = T (o) and Psy) = a, 
k, if no such 7 exists; 
the smallest 7 € (1, . . . ,k}, 
T (c) with 4 = T'_1(w) and X?(ey = b, 
k, if no such 7 exists. 


Then we show, as above, that the sequence (E(Xr,));-o1,...,» is antitone. 
By the construction of the above stopping times, the difference Xr, — 
X;,41$ 2b—aford =1,..., Utes, and = Oforp/2 z A» Ups +1. 
In the case \ = Ties +1, it is equal to X? — X7, , Here either 
Xpyi = X;, OF Xpys, S aandthenalso m, =X; «b thatis X, = X;- 
For À = Ūta») + 1, in both cases we have 


Xr» a XT e inf (0, Xx F a) = — (Xx = ay 
Thus we obtain the inequality 


p/2 


» Ata XA) Z O- a) -Ürun — (Xr — 9) 


A=1 


and hence, by integrating and taking account of the antitonicity of 
(E(X7)), we obtain inequality (11.3.3). | 
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PROBLEMS 


1. Let (X;)i-1,...,, be asub-martingale, and let [a,b] be a compact interval 
with end points a « b. Prove: 


lA 


(2) EUn) S —— BUX — 0]; 


(b) Eten) = L—— Esup (Xia) — sup (Xs). 


2. Let (Xjii.... be a super-martingale relative to an isotone family 
(M)i=1,...,& of o-algebras. Imitate the proof of Theorem 11.3.2 and 
prove the following strengthened form of the inequalities (11.3.2) and 
(11:33): 

1 


(a E(Uta») |X) S 
b—a 


[inf (X1,b) — E(inf (X:,b) | 913)]; 


() Bin |) s —— B((Xe — à) | 20). 


DE 


11.4 CONVERGENCE THEOREMS 


Example 3 of Section 11.1, which showed that the partial sums of an 
independent sequence of real, integrable, centered random variables form 
a martingale, and the close connection of this martingale with the sequence 
of random variables considered in the strong law of large numbers, raise 
the question of convergence of martingales and super-martingales. We 
shall now show-that (at least in the discrete case, that is, with N as index 
set) the interdependence of the random variables of a super-martingale 
is so strong that even simple additional hypotheses guarantee convergence 
almost every where. 

The following considerations again refer to a probability space (Q,90, P). 
Let (%n)nen be an isotone sequence of o-subalgebras of X. 


11.4.1. Theorem. Every super-martingale (X,),ew relative to 
(2,)nen Which satisfies the condition 


sup Z(X,) < œ (11.4.1) 
ncN 


converges almost surely to an integrable random variable X. 


Proof. The sequence (X,) converges almost surely if the event 


Q = {lim sup X, > lim inf Xa} 


n o no 
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has probability zero. To verify this property, we let Ü (os) (w) denote the 
number of upcrossings of the interval [a,b] (with a < b, a, b € R) by the 
finite sequence (X;(c));zi..... The sequence (Uf, »))x-1,2,... is then 


obviously isotone. We set Uta b} = sup Üt and thus obtain à random 
ncN 


variable with values in {0} U N U {+ œ}. For every w € Q and rational 
numbers a, b with lim inf X,(w) < a < b < lim sup X,(») we now have 


n o no 
Ü iaw) = œ, since there are infinitely many n’ with X» (w) > b and 
infinitely many n" with X» (w) < a. Thus we have 


GeV a= s r 


a, b rational 
a<b 


Therefore, it will be sufficient to show that P{Ujas; = ©} = 0 for every 
pair a, b of real numbers with a < b. For this we use (11.3.3), according 
to which, for every k E N, 


Bk») s —— EE — 252, 
a 


hence, by the Monotone Convergence Theorem, 


E(Ut.») € 


sup E((X, — a)-). 
b — Aken 


If we note the inequality (X, — a)- = sup (0, a — X,) € at + X;, then 
from the hypothesis (11.4.1) we obtain 


sup E((X, — a)-) € at + sup E(X,) < œ 


and thus, from the second Doob inequality E(U ta.) « o, and therefore 
P(Üg, = ©} = 0. This proves the almost sure convergence of (X,) toa 
numerical random variable X,, on Q. This is integrable, since by (11.2.8) 
(for the constant stopping time T = k) we have the inequality 


E(X,) S E(X) +2 sup E(X;)) (k=1,2, <), 
ncN 


which implies sup E(|X;|) < œ by hypothesis. But then Fatou's Lemma 
yields 


E(|X..|) = E(lim |X,|) € lim inf E(JX,]) < © 
and thus the desired integrability of Xp. J 
In particular, condition (11.4.1) is obviously satisfied if 


sup E(|X,|) < 0. (11.4.2) 
nEN 
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But the proof of Theorem 11.4.1 tells us that conditions (11.4.1) and 
(11.4.2) are actually equivalent for a super-martingale (Xn)ncy. Therefore 
condition (11.4.1) is satisfied if all X, 2 0 or if the super-martingale 
(X,)nen is uniformly integrable. In this case Theorem 2.12.2 yields the 
validity of (11.4.2). It now follows further from Theorem 2.12.4 that a 
uniformly integrable super-martingale (X,),ew also converges in mean to 
X»: We need only take into account Theorem 2.11.4, by which stochastic 
convergence follows from almost sure convergence. 

In conjunction with the above convergence theorem, the question 
arises of whether the super-martingale (Xn)ncy of Theorem 11.4.1 be- 
comes a super-martingale (Xn)nenuj.) by the addition of X,. To this 
end, we enlarge the sequence (Xn) by the following c-algebra, generated 
by all M, in Q: 


Ao = A (U M). (11.4.3) 
n=l 
11.4.2. Corollary 1. Let (X,)ncy be a super-martingale relative to 
the sequence (%n)nen~ whose random variables X, are all 20 or uniformly 
integrable. Then (X,) converges almost surely to an integrable random 
variable X, such that (Xn)nenuj.} ls a super-martingale relative to 
(Mn) exu t) 


Proof. By Theorem 11.4.1, (X,) converges almost surely to an integra- 
ble random variable X,,. Since every X, is Xn- and thus 9(,- measurable, 
the set (lim sup X, = lim inf X,] of all o for which lim X,(w) exists, lies 
in 9(,,. Therefore we can assume that X „is 9(,- measurable and real-valued. 
'Thus all we still have to verify is the inequality 


Í X,dP z f, X, aP 


for every A € An and every n CN. To this end, let m be a natural 
number >n. Then f4X,dP = f4X,dP. By the passage to the limit 
m — œ we now obtain the assertion: In the case X, 2 0(m = 1,2, .. .) 
from Fatou's Lemma, and in the case of uniform integrability, because of 
the convergence in mean of (Xm) to X, just noted. J 


11.4.3. Corollary 2. Every uniformly integrable martingale (Xn)nen 
relative to (W,)nen converges almost surely and in mean to an integrable 
random variable X, such that (X,)sewute is also a martingale relative to 


(Wn) exu {oo}: 


Proof. It suffices to apply Corollary 1 to (X,) and ( — X,). The reason- 
ing used in the discussion following (11.4.2) yields that (X) also converges 
in mean to X,. .] 
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Examples of uniformly integrable martingales are easily found. We 
show that Example 2 of Section 11.1, is such an example. 


11.4.4. Theorem. For every integrable real random variable X on 
(Q,39(,P) and every isotone family (9(),er of o-subalgebras of A (with 
arbitrary partially ordered index set J), (E*«( X));er is a uniformly integra- 
ble martingale relative to (90,);cr. 


Proof. All we need to prove is the uniform integrability of the family 
(X)ier of random variables X, = E*«(X). Since |X,| € E*«(|X]|) holds 
almost surely for every t € I and since X, is ?(,^measurable, we obtain 
for every a 2 0, 


x 
üt es IX dP 3 pee |X| dP, 


and, in particular, for a = 0, E(|X|) x H(|X|). By the Chebyshev- 
Markov inequality, for a > 0 we then have 


PIX) = a} S- HX) s-E(X)  (€D. 


By Theorem 2.9.6, for every e > 0 there is a 6 > 0 such that P(A) < ô, 
A € A, implies f4|X| dP < e. Accordingly, 
f |X:| dP < IX|dP < « 
([Xizo] 


~ J(IXdza]) 


for all a = 5-!E(|X|) and every t € I. This proves the uniform integra- 
bility of (X )ier. 4i 

Under an additional assumption on J which is often satisfied in applica- 
tions, and in particular for every totally ordered set J and thus especially 
for I = N, the converse of Theorem 11.4.4 holds. This additional assump- 
tion is that the partially ordered set I is filtering to the right, that is, for 
any two indices tı, t2 € I there always exists a t € J such that tı € t and 
i; € t. We then obtain the following representation theorem: 


11.4.5. Theorem. For every uniformly integrable martingale (X,),er 
relative to an isotone family (9(),er of o-subalgebras of X such that J is 
filtering to the right, there exists an integrable real random variable X 
such that 


X, = E*(X), almost surely, for all t € I. (11.4.4) 


Proof. First, the assertion is true in the case J = N, since by Corollary 
11.4.3 there exists an integrable random variable X „ such that (Xn) nenu{eo} 
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is a martingale relative to (95), cw... But then X, = E%(X,) almost 
surely for all n € N; thus, X = X,, yields the desired result. 

For arbitrary J filtering to the right, we show that for every e > 0, 
there exists te € I such that E(|X, — X,|) < e for all s, t € I satisfying 
s Z te and t 2 te If this were not the case, then for some e > 0 there 
would be an isotone sequence (ta)nen in J such that E(X n n — Xn) = e 
for all n € N. Then (X: )nen would not be a Cauchy sequence in £!(P). 
But this is impossible, since (X;,) is a uniformly integrable martingale 
relative to (9[,) and thus, by the remark made after (11.4.2), converges 
in mean. 

Now consider the sequence (1;/;) cw. Since J is filtering to the right, we 
can choose this sequence to be isotone. Then (X,,).cw is a Cauchy 
sequence in £1(P), that is, it converges in mean to an integrable random 
variable X. By the construction of (tin), we have H(|X, — Xan) € 1/m 
for all t Z tım and all n 2 m for arbitrary m € N, whence follows, for 
M—F ) 


8 |= 


for all m € N and all t Z (i4. Thus the martingale converges in mean 
“along I”? to X. As in the proof of Corollary 11.4.2, we now show that X 
yields the desired result: Let s € J be arbitrarily chosen. Then 


[, X.aP = f, X.aP 
for all A EYA, and allt = s. The passage to the limit in ¢ along I” yields 


the desired equality 
f X,dP = Í X dP, 
A A 


since (14X;).cgr converges in mean to 14X. J 


11.4.6. Corollary. The random variable X of Theorem 11.4.5 can be 
chosen to be measurable relative to the c-algebra Ao = A (QJ A). It is 
t€ 


then almost surely uniquely determined by the property (11.4.4). 


Proof. 'The variable X constructed in the proof of Theorem 11.4.5 is 
the limit of a sequence (X,,, ) ew, convergent in mean, consisting of random 
variables of the martingale. All X,,,, are ?(,- measurable; by Theorem 2.7.5, 
a suitable subsequence (X,,),ew converges almost surely to X. Since the 


8 In other words, we show that (X;),c; satisfies the Cauchy criterion relative to 
convergence in mean (as ¢ "tends to infinity"). 
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set of all w € Q for which lim X,,(w) exists also lies in Xs, we can thus 


ko œ 
assume that X is %,,-measurable. 
If X* is another 9(,,- measurable, integrable random variable such that 
X, = E*«(X*) almost surely for all t € J, then 


J, XaP =f, Sees L X*dP 


holds for all t € I and all A € A.. Thus the system D of all sets A E A, 
satisfying 


AGE = XP 
A A 


is a Dynkin system containing the system € = V A.. Now J is filtering 
(€I 
to the right and therefore G is /^-stable. Two sets Ai € M, and A» E M, 


from ( lie in one and the same c-algebra Y, provided t Z tı and t 2 fe. 
Thus, by Theorem 1.2.3, D = A,, that is, [ax GP = xe dP for all 
A EA, By choosing A first equal to (X = X*} and then equal to 
(X € X*}, we obtain X = X* almost surely. J 


Remark. The proof of Theorem 11.4.5 tells us that the martingale 
(X1)ter converges in mean along J to X. In the case J = N, we showed in 
Corollary 11.4.3 that we actually have almost sure convergence to X. A 
counterexample of J. Dieudonné [30] shows that for arbitrary J the almost 
sure convergence is lost in general. In this counterexample J is countable 
but not totally ordered. 

Finally, we discuss a second type of convergence theorems. In contrast 
to the Convergence Theorems 11.4.1-11.4.3, here the set —N (with the 
usual ordering) of all negative integers plays the role of the index set N 
used there. It is remarkable that we shall be able to prove convergence 
under very weak hypotheses on the super-martingale under considera- 
tion.? This is due mainly to the following: 


? It is interesting to introduce the set N of natural numbers as index set for a super- 
martingale (X4;)sc.w relative to (91,),c .N. We set Y, = X_, and Yn = A-n for every 
n EN. Then the sequence (38,),cw is antitone, and the crucial property (11.1.2) of a 
super-martingale is written in the form 


E*^(Y,) S Y, forall m, n EN with m < n. 
Therefore super-martingales with N (—N) as index set are often called “increasing” 


(“decreasing”) super-martingales. 


A 
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11.4.7. Lemma. For every super-martingale (X,),c. y relative to an 
isotone family (2,),c w of e-subalgebras of A, the following statements 
are equivalent: 


(a) sup E(X,) < +o: 
nCc—N 
(b) sup E(|X,]) < +; 
nc—-N 
(e) (Xn)ne—n is uniformly integrable. 


Proof. (a) = (b): Since X, < |X,|, (a) follows from (b). Thus we need 


only show that “(a) = (b)". For every n € —N,|X,| = X, + 2X, and 
hence 


E(|X,|) = E(Xn) + 2E(X;). 


By (11.1.6), (X;).c-w is a sub-martingale and therefore the family 
(E(X;,))ne_w is isotone. Consequently, Z(X,) € E(X7) for alla E€ —N, 
and hence y 


sup E(|X,) € sup E(X,) + 2E(Xz,) < o. 
nc—N nc—N 


(b) = (e): Due to the equivalence of (a) and (b), for every real number 
e > 0 there is ke € —N such that 


sup E(X,) S e + E(X;,). 
nc—-N 


Moreover, for all n E —N and all real numbers a > 0, 


joe IX, | dP = Y. X,dP — | ee X,dP 


EQ) eee vaP + fes X, dP. 


and 


From the two equalities it follows that 

m bascorqodn c ed c QI 
and thus, using the defining inequality (11.1.3) for super-martingales, 
ees eer) e juo AMT jn Tug 
sit EO Le ie o Xu SE 


zy tf |X;,] dP 


(IX5|]Za]) 


I 
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for all n € —N with n € ke. By the Chebyshev-Markov inequality we 
have 


P{|Xa| Z a} £ T EUX.) SÈ sup mqxp. 
a Q nc—N 


By Theorem 2.9.6 we have uniformly for all n € —N, 
lim f... Xd dP = 0. 


a +0 


By the above inequality, however, this implies 


us VERE Belaz = 0, uniformly for all n E —N. 
Therefore, according to Theorem 2.12.7, II, (X,),g-w is uniformly 
integrable. 


(c) = (b): This follows directly from Theorem 2.12.7, III. J 


Now we can give an analog of Theorem 11.4.1 and its corollaries. For 
this, we introduce the o-subalgebra 


ie ee IO NIRE (11.4.5) 


nc-N 


for every isotone family (Mn)ne—n of o-subalgebras of 9f. 


11.4.8. Theorem. Every super-martingale (Xn)nc_y relative to a 
family (9,), e. satisfying the condition 
sup E(X,) < +o (11.4.6) 
nc-N 
converges almost surely and in mean to an integrable random variable 
X.,. Then (X,).c-wu(-4) ls à super-martingale relative to the family 
(Mn) e-Nut-ar- 


Proof. The proof of almost sure convergence of the super-martingale 
to an integrable random variable proceeds analogously to the proof of 
Theorem 11.4.1. In this case we have to apply the Doob inequality (11.4.3) 
to the finite families (X;)i— 4, ..., 1k EN. The integrability of the limit 
variable X_,, follows in the same way as in the proof of Theorem 11.4.1 
since from Lemma 11.4.7 we have the boundedness of the expected values 
E(|X.|]), n € —N. If we take into account Theorem 2.12.4 and the uni- 
form integrability of the super-martingale guaranteed by Lemma 11.4.7, 
we obtain convergence in mean. The rest of.the assertion then follows, 
analogously to the proof of Corollary 11.4.2. J 
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The situation in which (X,),c. wis a martingale is especially convenient. 
Then the family (E(X,,))ne_n of expectations is constant, and thus (11.4.6) 
is trivially satisfied. Hence we immediately obtain the corollary: 


11.4.9. Corollary. Every martingale (X,),c yw relative to a family 
(Q,).e-w converges almost surely and in mean to an integrable random 
variable X ,. Then (X;)se-wu(-«) is a martingale relative to the family 
(Wn) ne—NU{—w}+ 


PROBLEMS 


1. Let X be a real, integrable random variable on (,2,P), and let S be 
the set of all c-subalgebras 3 of M. Prove that the family (E9(X)) BES 
is uniformly integrable. 

2. Let (Xn)nen be a super-martingale relative to (9(,),cw of random 
variables X, 2 0. Suppose that sup E(X?) < +œ for some real 

nEN 


number p 2 1. Prove: (Xn)nen converges almost surely and in pth 
mean to a random variable X, 2 0 such that E(X?) < +œ and 
(Xn)nENU{s} iS à Super-martingale relative to (Wn) nEN{eo}: 


11.5 APPLICATIONS 


We show the range of the theorems of the preceding section by applying 
them to several particular situations with which we are already partly 
familiar. 


(a) Convergence of sums of independent random variables 


As in Section 11.1, Example 4, let (X,)new be a sequence of real, integra- 
ble random variables such that 


GeO ande HX aa Xk wes tke) c0; 


11.5.1 
almost surely, for all n € N. ( ) 


The sequence of partial sums Sn = Xı + :- - + Xn is then a sub- 
martingale. Hence, by Theorem 11.4.1, (Sn) converges almost surely to an 
integrable random variable if one of the following equivalent conditions 
is satisfied: 


sup E(|S,]) < œ, (11.5.2) 
nEN 


sup E(St) < œ. (11.5.3) 
nEN 
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If the random variables are all centered, then (S,) is a martingale. Since 


E(E(Xns1| Xi, ...,X)) = E(Xny1) = 0, and because of (11.5.1), we 
then even have 
Pian Xpand. bok) es On almost surely. 


By Section 11.1, Example 3, this is the case in particular when the sequence 
(X,) is independent and all X, are centered. In this case the following 
condition is sufficient for (11.1.2) to hold and thus for almost sure con- 
vergence of (S,): 


oo 


> V(X,) < o, (11.5.4) 


n=1 


provided we assume that all X, are square integrable. By the Hélder 
inequality (for p = q = 2) we then have 


E(S.D* = EQ -|S,D* s EWED = V($) = ) V(X) s Y V(X) 
1-1 i=1 


and hence 


oo 


sup E(IS4) < E vx |” «ut 


i-1 
The relationship between the above and our considerations of the strong 


law of large numbers becomes clear when we recall the following lemma 
from analysis, the so-called Kronecker Lemma: 


11.5.1. Lemma. Let (£a)nen and (r;).ew be two sequences of real 
numbers where the second is isotone, contains only numbers 7, > 0,and 
diverges to + œ. Then 


9o n 


Ti 4 1 
> — convergent > lim — > 07 3-0; (1155) 
Ti n9 Tn 


i21 i-1 


Proof. Let sn Z?,(xj/r) and s 
know that the sequence 


li 


lim s, = Z;,(x/r). Then we 


(= AF Mi. «5 4E 2) 

CX WE ES + on 

also converges to s if (en) is a sequence of real numbers Z0 with Ze, 
divergent. Thus we have 


7281 + (Ta = T2)S2 Wr OG Pate (Tn41 B Tn) Se E 


lim 


A^ oo Tn+1 


) 
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, T1$1 
and hence, because lim = 0, 
n Tn+1 
3 (T2 == T1)81 T (T3 = T2)82 gr Tad VINO He (Tn41 FT Tn) Sn 
lim = § 
n o Tn4-1 


'The assertion now follows since 


n+1 
1 ». (T2 = 7T1)51 a 5 af (Tn41 = Tn) Sn 
X Sna — š 


Tn+1 


a 


4 Tn+1 


i-1 


Now let (Xn)nen be a sequence of real, integrable, centered random 
variables satisfying 


E(Xi1 = O and E(X,41| Xy, .. . Xn) = 0, 


almost surely, for all » € N, (11.5.6) 


and let (tn)nen~ be a sequence as in Lemma 11.5.6 Then obviously 


1 1 
B (ix) = 0 and E (> DEA 
TI Tn+1 


1 
ixn... EX) = 0, 
T1 Tn 
almost surely (n EN), 


so that (Zz ,(1/r) Xi)nen is a martingale. If this converges almost surely, 
then by the Kronecker Lemma 


1 
lim — X; = 0, almost surely. 


Because the X; are centered, this is the strong law of large numbers for the 
sequencer, = n. Thus condition (11.5.2) for S, = Z7 ,(1/7;) X; is sufficient 
for almost sure convergence of (1/7,)27 ,X; to zero. Thus we have derived 
the strong law of large numbers under weaker independence conditions, 
namely, those formulated in (11.5.6). 

If the sequence (X,) is independent, then our observation concerning 
(11.5.4) tells us that also the condition 


oo 


» > V(X,) < o (11.5.7) 
n=1 


is sufficient for almost sure convergence of (1/7,)27.,X; to zero. For 
Tn = n this is Theorem 6.4.1 of Kolmogorov. 
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(b) Sums of independent, identically distributed random variables 


We can also derive Kolmogorov's Theorem 6.4.2 from the martingale 
theorems. As in Theorem 6.4.2, let (Xn)nen be an independent sequence 
of real, integrable random variables with the same distribution u. We 
assume every X, to be centered and then have to prove almost sure 
convergence of (1/n)Z7,X; to 0. We again set Sn = Xi H °- o Xs 
for all n. 

For each n € N, we use %_, to denote the o-algebra generated by all Sm 
with m 2 n. Then (95,),c x is obviously an isotone sequence of c-sub- 
algebras of the c-algebra X of the probability space (Q, Y, P). 

Since S441 — Sm = X441 (m E N), we have 


BOX A cue ECC Si Xt, Xa 2) almost surely. 


Moreover, X; and S, are independent of the o-algebra generated by 
Xa, Xas . . . . Therefore, by Theorem 10.1.4, 


BO SX aa ais. se Bos almost surely, 
and hence 
E(X31|91.4) = ECX1|8,), almost surely (n E N). (11.5.8) 
As we shall show shortly, we also have 
EONS DEEP ORIS almost surely C oH YT RO) 


due to the special hypotheses. From (11.5.8) and (11.5.9) it now follows 
that 


` 


1 j=l 


j= 
1 
mS. almost surely. 
n 


Now Theorem 11.4.4 and Corollary 11.4.9 apply, and thus the martingale 
(E(X: | AXn))nc-n converges almost surely and in mean to an integrable 
random variable X_,,; hence, we have 


etel 

lim T Da = Xo almost surely. 
The convergence in mean implies E(X.,) = lim E((1/n)S,) = 0 since 
the variables X, are assumed to be all centered. The considerations after 
the Kolmogorov 0-1 Law (Theorem 60.2.3) yield that X.,, is measurable. 
relative to the c-algebra of terminal events determined by (X,). There- 
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fore, by the 0-1 Law, X , is almost surely constant.!° Hence and from 
E(X.,) = 0, it follows that X ,, = 0 almost surely. 

The proof of (11.5.9) still to be given is an application of the Trans- 
formation Theorem for integrals. Obviously we need to show only that 
JaXidP = fAX;dP for all A C 9(($;) andj = 1, . . . , n, that is, for 
all sets A = S, (B) with Borel B C R. But now 


TH X dP- Í (leo S) X; dP 
- Í BOR d mp LL © Coad ear a 
= il (1s ° s)p;dPx,g...@x,) 


if s and p; denote the mappings (a1, . . . ,tn) > zı + °- +a, and 
(zi . . . fn) > z;jof R” into R, respectively. Because of the independence 
of all Xn, the joint distribution Px,s...gx, is equal tou & - - Qum. 


Thus, using Fubini's Theorem, we obtain the value 


ip X;dP = | an f im + pe E danses (das) 
= f uo — z) : z; u(dz;) = f wB — 2) ` z uae), 


independent of Jj, where 4-7? denotes the (n — 1)-fold convolution 
product of u with itself. Thus we have proved Kolmogorov's Theorem 
6.4.2. 


(c) Connections with differentiation theory 


First, we note that our presentation of martingale theory uses the 
existence of conditional expectations only in Theorem 11.4.4. In all other 
eases, they serve only to shorten the statements of definitions [as in 
(11.1.2)] or results. Thus, with the exception of Theorem 11.4.4, the 
convergence theorems of Section 11.4 do not need the Radon-Nikodym 
Theorem in their proofs. It is therefore noteworthy that martingale theory 
provides us with a new proof of the Radon-Nikodym Theorem (for the 
case of finite measures, which was the essential case in our earlier proof. 

Thus, let (Q,9(,P) be a probability space and Q a finite, P-continuous 
measure on Ñ. We refer to Section 11.1, Example 5 and consider the 
partially ordered set I of all finite decompositions of Q into measurable 


10 Since {X_. € n] f 9 and {X_. € —n}] Ø (n EN), the set of alla € R with 
P(X., € a] = 1 is nonempty and bounded below. For the infimum o, of these a we 
therefore have P{X_. < ao} =1 and P{X_. € 8|] — 0 for all 8 < a, that is, 
P(X., < ao} = 0 and hence P{X_. = ao} = 1. 
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sets, the associated isotone family (9f)),e; of finite o-subalgebras of X, and 
the martingale (X,):cr, defined via Q, relative to (2l.),cer. We note that: 


1. is filtering to the right. If s = (B).51....4, and t= (Cj) 9. 4, 
are two partitions from J, then u = (B; (Y C;))i=1,... m i1... 5, 18 another 
element of J which satisfies s € u andt € u. 

2. We have X = UJ N. Thus, in particular, X = A, in the sense of 


tel 
the notation of Corollary 11.4.6. Indeed, every A € Y lies in the o-algebra 
M, associated with the decomposition £4 = (A, CA). 

3. For every e > 0, by Theorem 2.9.6 there exists a ô > 0 such that 
P(A) € 5 implies Q(A) < «(A E 90). 

4. The martingale (X;)gr is uniformly integrable. For every real 
a > 0 and every t = (B;) € I, we have 


sens cue PER X. dP = » Q(B) = Q(X, 2 a} 


XtZo 
on Bi 


and 


P(X, 2 a) s ŽE) =~ Q(9). 


We need note only that E(X.) is independent of t. Fort = (@,Q) € I, we 
compute the constant value to be Q(Q). Uniform integrability now follows 
directly from property 3 above. 

Using Theorem 11.4.5 and martingale theory, we find that there exists 
a random variable X € £(P) such that X, = E™(X) almost surely, 
that is, such that 


[, Xap = [, Xap, ~ 


for all A € X, and every t € I. For A € 9( and the decomposition t4 of 
property 2 we thus obtain 


Q(4) = f, x. dP = [ Xap. 


But then X is P-almost surely equal to the density of Q relative to P, 
and thus the Radon-Nikodym Theorem is proved anew. 


We can work with the martingale (X;);cj in the above proof, where J 
is a subset of I filtering to the right such that J 9f, is still a generator 
ted 


of A. Since this generator is /^-stable, Q(A) = Jax dP for all A E U &, 
tEJ 
implies this equality for all A € Y. 


If, in particular, (X,):es is a sequence, then’ we also have available 
Corollary 11.4.3. Accordingly, (X;)g; converges almost surely to the 
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density X. This is the case when X is countably generated. If (An)nen 18 a 
sequence of events generating A, then we choose for t, the decomposition 
of Q "generated" by Ai, . . . , An. Since %, contains Ai, . . . , An, this 


proves that is generated by VJ 9(, Thus we can choose J = (tt, . . .]. 


n=1 
If we make use of these observations, then in certain cases the density X 
appears not only formally as above, but explicitly as a derivative. There- 
fore we speak of the Radon-Nikodym derivative X. An example of this is 
the following: 


Example 


Let Q = [0,1[, X = 2 AO B and P = Xj, and let Q be a P-continuous 
measure on N. We examine it with the help of its distribution function 


F(x) = Q(0,21) (æ € [0,1D, 


which appears here restricted to [0,1[. For every n = 1,2, . . . we choose 
a finite sequence 9? = (x, . , t$?) of points 


Deo eter hei stc es T 
such that all elements of the sequence $C? occur in F+» and such that 


lim max (r9? — 2$?) = 0. 
nw OSj<kn 
Each of the sequences §™ defines a partition tn» of € consisting of the sets 


It ang e: 0, , k, — 1. For the associated (finite) c-algebras 


Hi we nes obviously have A., C A, and A = A OD) 9t, ). The 
n=1 


n+l 


density X of Q relative to P is P-almost surely equal to lim X,, by Corol- 


n— o 


lary 11.4.3. But now, for every x € [0,1[ and every n € N, 


Q([ 2 "Eas ERICH = Fa? 


X E 
a = Peau) ^ xp cap 
where j+ denotes the uniquely determined number from (0, . . . , kn — 1} 
with x € [zf?, x\"\ ,[. It can then be derived from the Convergence Theorem 


11.4.3 that F is differentiable at A'-almost all points from [0,1[ and X(x) 
is its derivative A-almost everywhere. We will not present a proof of 
that fact. 

This example can also be modified in many ways and can then be used, 
via Corollary 11.4.3, for the proof of the theorem that every monotone 
function on an interval in R is \!-almost everywhere differentiable. 
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PROBLEMS 


1. Let (Xn)nen be an independent sequence of real random variables 
such that Z7 ,|X.,| is integrable. Define for each n € N: 


LES LIB OS CERES 


Y. -E (5 X, Bt, ): 
Prove: 


(à) Ya= ) Xi+E(Xi+ +X), foralla EN. 
(b) (Y;,)^e-w is a martingale relative to (A n)nc_n- 
(c) (Yn) converges almost surely and in the mean to an integrable 
random variable. Determine this variable explicitly. 
3. Let (X,).gw be a sequence of real, integrable random variables. 
Define a new sequence (Yn)nen as follows: 


Yı = Xi; 
Y = Xe = » [E (X; | Xi, S CE Xei) = Xi] (n = 2): 
i=2 


Prove that (Yn)nen is a martingale. Give conditions which imply 
that (Y,) converges almost surely. 


12 


STOCHASTIC PROCESSES 


The concept of stochastic process is basic to many branches of proba- 
bility theory. It is at the same time so simple that the reader will wonder 
why this concept was not introduced earlier. The reason is that the con- 
cepts of conditional expectation and of kernel are fundamental in answer- 
ing the questions characteristic of the theory of stochastic processes. Use 
of the term “stochastic process" only as an abbreviation would have been 
of no advantage in the previous material. 


12.1 DEFINITION AND CONSTRUCTION 
OF STOCHASTIC PROCESSES 


The problems of probability theory treated so far mostly involved a 
sequence (X,),cw of random variables with values in a measurable space 
(Q^,9|). Here we often interpret X, as the outcome of a random experi- 
ment carried out at “points of time" n = 1, 2, . . . . But it is often 
necessary to handle a random occurrence with continuous (and not only 
discrete) time with the methods of probability theory. A typical example 
from physies is that of Brownian motion, of a particle (for example, a 
molecule) in a liquid or a gas; motion due to molecular collisions. We can 
describe the position of the particle at time t 2 0 by a random variable 
X, with values in R*. The family (X,):cr, can then be used as a probability- 
theoretic model for the investigation of Brownian motion. Here it is 
assumed that the random variables X, are all defined on one and the same 
probability space (Q,9(,P). The probability measure P ''controls" the 
motion of the Brownian particle. The various paths that the particle can 
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travel are described by t — X.(w), that is, by a mapping of R, into R?, 
where w is arbitrarily chosen in Q. Such considerations lead to the following: 


12.1.1. Definition. Every quadruple (0,30, P,(X;),er), where (9,30, P) 
is a probability space and (X,):er is a family of random variables on this 
space with values all in a measurable space (#,%), is called a stochastic 
process (or, simply, process). We call J the parameter- or time-set and E 
[or, more precisely, (£,98)] the state space of the stochastic process. For 
every w € Q, the mapping from J into E defined by t — X,(w) is called a 
path! of the process. 


Since we generally think of the probability space (0,96, P) as given, we 
often denote a stochastic process (0, %,P,(X1):er) simply by (X1):er or even 
just (X). 

For I = N we thus obtain the sequences of random variables with values 
in a fixed measurable space as a special case. Brownian motion is to be 
considered as a stochastic process with state space R? and parameter set 
R,. Further, every super- or sub-martingale is a stochastic process. The 
Doob inequalities contain results about the behavior of the path of a 
super-martingale. 


Now suppose we are given a stochastic process (2,%,P,(X.):er) with 
arbitrary parameter set I Æ Ø and arbitrary state space (4,88). For 
every nonempty subset J of J, let HY denote the set of all mappings from 
J into E, that is, the product set ll E, in which every factor E, is equal to 

(€J 
E. Correspondingly, we let 8/ denote the c-algebra & 38, in EY, where 


(€J 
every $8, = H. Further let 


X=0X (19:1: 1) 
tEJ 


be the product mapping from Q into EJ already introduced in Section 5.4. 
This is an (£7,937)-random variable. The joint distribution of the family 
(Xie; of random variables is the distribution of Xz, namely, 


There are simple relationships between the probability spaces (HY ,B/,P,;): 
Let J and H be nonempty subsets of J with J C H and let 


př: E8 > E! GPA) 


1 The terms “trajectory” and “realization of the processes" are also commonly 
used. Since to every w € © there corresponds a path and hence a mapping (from I 
into E), stochastic processes are sometimes also called random functions or mappings. 
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be the 8#-B/-measurable projection mapping. (With every element of EZ, 
that is, every mapping from H into E, it associates its restriction to J.) 
For H — I we let 


D; = pj. (12.1.4) 


Since, for any two nonempty subsets J and H of I with J C H, we see 
that X; = př o Xy holds, then in view of (12.1.2) we obtain 


P, = p (Ps) (12.1.5) 
and for H = I, in particular, 
These are the relationships announced above between the probability 
spaces (H/,8/,P,). For every finite, nonempty subset J = (ty . . . ,tn} 
of I, the measure P, has a simple interpretation. For any n sets Bı, 
Soc CUN 
DXBSX + > YBa) = PXT (Bi X? xt X Ba) (12.1.7)? 
= P(X; © Bi, .. + ,X, € Ba}. i 


Thus if we interpret the process (X,),cr as random motion of a particle 
in E, then P; is the only probability measure on BY such that for any n 


sets Bj, . . . , Ba E 8, the number P;(B1 X > - - X B,) is equal to the 
probability that the particle finds itself in B, at time tı, in B; at time 
te, . . . , in B, at time t,. If © = HU) denotes the set of all nonempty, 


finite subsets of J, it thus becomes clear why we call (P7);eg( or also 
(EY, B/,Pa)sesu the family of finite-dimensional distributions of the 
process. This family is projective in the sense of the following definition: 


12.1.2. Definition. If condition (12.1.5) holds for a family (Ps) sega) 
of probability measures on the measurable spaces (£Y,8/) for any two 
sets J, H € $(I) satisfying J C H, then the family (P;) is said to be 
projective. 


Remark. 1. In this definition, it would be sufficient to require (12.1.5) 
only for sets J, H € $ such that H has exactly one element more than J. 
If J and H are sets from $ with J C H and if, say, H \ J contains exactly 
n elements, then we determine the sets Ho, Hi, ..., Ha such that 
H,-—-JCH4C ---G H, —H and H;XH;.4 always has. only one 
element. The assertion then follows from 


H — Hi Ha moe - Hn 
(Vn = Diets PHAS 9 Du, 


!Hereweset (X, € By, -.., X, € B.) = (X4 € Bb o -- OX, E Bn}. 
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A crucial question for the construction of processes is whether, under 
certain hypotheses on the measurable space (#,%) and the set J, every 
projective family (P;);ega; of probability measures on the measurable 
spaces (£47,397) is the family of finite-dimensional distributions of a 
stochastic process on a suitable probability space with state space (1,88) 
and parameter set J. This question plays the same role in the theory of 
processes as the question, answered positively by the results in Section 5.4, 
of whether every family (Q;):cr of probability measures is the family of 
distributions of an independent family (Y;)ier of random variables. 


Our question is answered in the generality suitable to our purposes by 
the following theorem of Kolmogorov and a corollary. 


12.1.3. Theorem. If E is a Polish space, 38 is the c-algebra of its 
Borel sets and J is an arbitrary nonempty set, then for every projective 
family (Ps) sEguy of probability measures on (E7,937) there exists exactly 
one probability measure Pr on (£!,33) satisfying 


p; (P1) E P. for all J E SU (12.1.8) 
We call Pr the projective limit of the family (Ps) seg uy and write 
lm. Py = Jim Py = Pz. (12.1.9) 
— —— 
JES) 


Proof. Similar to the proof of Theorem 5.4.2, let 3; = p;'(B”) be the 


c-algebra of the so-called J-cylinders and let 3 2 V  3,;. Then 3isagain 
JESU) 


an algebra in EH’ since J, H € O(I) and J C H imply 3; C 8u due to the 
B#-B’-measurability of p7. Moreover, by the^definition of $87, 3 is a 
generator of Y’ and thus 8 = 3((3). Condition (12.1.8) on Pz tells us that 
Pr is a probability measure on B with P;(Z) = P;(B) if Z = pz*(B) isa 
cylinder set (B € 337). It now follows from the Uniqueness Theorem 1.5.5 
that there can be at most one such probability measure Pz. The existence 
of Pr is obtained from the Extension Theorem 1.5.2 if we can show that 


P(Z)= PB) (4 = pz'(B), BE B,J € HU) 


defines a premeasure on 3 with Po(#’) = 1. Due to the projectivity of the 
family (Pj), we see that Po is well defined on 3 and is a content. For this, 
we need only repeat the corresponding reasoning in the proof of Theorem 
5.4.2. Since E! = p;'(E/) for every J € $(I), we have P(E) = 
P;(E7) = 1. Thus all we have left to show is the @-continuity of Po. 
Thus we assert: If (Z,) ney is an antitone sequence in 3 such that Po(Z,) = 


À > 0 for all n EN, then (\Z, # Ø. But this can be seen as follows: 


nl 


STOCHASTIC PROCESSES 361 


Every Z, is of the form Z, = p;1(B,) with B, € BY and Jn € O(I). 
Since every J-cylinder is obviously also an H-cylinder for all H € (J) 
with J C H, we can assume J» C Jnyi for all n € N without loss of 
generality. By Section 7.3, Example 2, every finite product of Polish 
spaces is a Polish space and in particular each of the spaces E» is Polish. 
Therefore, by Theorem 7.3.3, the probability measure P;, is regular on 
E». Thus for every B, there exists a set K, C Ba, compact in E^, satis- 
fying Ps,(B,) — Ps (K,) = Ps,(Bn\ Kn) € 2-"&. Then Z! = pj! (K;) 
is a J ,-cylinder satisfying E C Z, and 


In order to obtain an antitone sequence, we set Y, = ve (NGS ent Nee, 
Sota Tu andil c Z. CZ, for all n, Then, Py, Y.).> 0 and 
hence Y, = Ø. Indeed, because of the finite additivity of Po, 


Po(Zn X Yn) 


P, (I Z. eS P (I Z: X Z2) 


IIA 


l PINZA D joue 


and thus PQ(Y,) — Po(Zn \ (Zn \ Ya)) = Po(Zn) — Pol(Zn \ Ya) > X— 
à = 0. Now we arbitrarily choose y, in Yn. Because of the antitonicity 
of (Yn) we have ynip € Yn and thus ynip € Z, for all p € N, that is, 
PI Um) € Krn for all m, n € N with m 2 n. For every t € U Jn which 
belongs to, say, Jn, all terms of the sequence (pi4(y»))»ew With the 
exception of finitely many initial terms then lie in p/3(K,). Since every 
projection mapping p7 is continuous, this set is compact. Now X JU is 
countable as a countable union of finite sets; let tı, t2, . . . be the elements 
of this set. Then, due to the compactness of pi (E), t € Jn, there exists à 
subsequence (y) of (Ym) for which (pu (Yn)) converges, a subsequence 
(y) of (yh) for which (pu, (y,)) converges, and so on. For the diagonal 
sequence (y\”) nen, 


Z = lim paoa) = lim yt» (t) 
then exists for every t € \U) J, Since for the initial sequence Ds,(Ym) was 
in K, for all m, n € N with m 2 n, we now have 
pr (ys ) € Kn C E? 
for all m, n € N with m Z n and thus, because HY» is a product space, 


CAT RENS Zr, ) E Kn, for all n, 
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where 71, . . . , Ta, denote the elements of Ja. But then the mapping 
z: I — E defined by 


2(t) = Zt if t €E Von 
-~ Van arbitrary point of E, if t E I\ U J, 


is a point from £' satisfying 
PA= (Geek Se ery) E Ky, 


that is, z C Z! = p37! (K,) for all n and thus z € ( YZ, C ( Y Zan. Hence 
we have shown that the set / Y Zn is nonempty. J 


12.1.4. Corollary. If E is a Polish space, $8 is the c-algebra of its 
Borel sets, and J is an arbitrary nonempty set, then for every projective 
family (Ps) sEguy of probability measures on (£Y,8/) there exists a sto- 
chastic process with state space E and parameter set J such that (Px) regn 
is the family of its finite-dimensional distributions. 


Proof. We choose Q = E!, A = $8, and P = Pr = lim Py. Then 
ee 

(Q,%,P) is a probability space. For every t € J, the projection mapping 

Py With piy(o) = w(t), w € Q, is an ?(-8-measurable mapping, and thus 

X, = piy is a random variable with values in E, that is, (Q,9G, P,(X)ier) 

is a stochastic process with state space E and parameter set I. This process 

yields the desired result, since for every J € (J) and every set B € BY 


P(X,(B)) = P(p; (B) = ps(Pr)(B) = Pj(B). J 
The proof thus tells us that s 
(ETBE lim E (pity) tex) 
XC 


is a process with the desired properties. We often use it explicitly in con- 
structions and therefore give it a special name: It is called the canonical 
process? associated with the projective family (Pj);eg(». Since p(w) = 
w(t) for o € E', E! is here equal to the set of all paths of the process. 
Thus every mapping v: I — E is a path of the canonical process. 


Remark. 2. Westill need to clarify the relationship of Theorem 12.1.3 
to Theorem 5.4.2 on the existence of infinite products of probability 
measures. Let (#,8) be an arbitrary measurable space and P a probability 
measure on 38. Then by (5.4.8), for an arbitrarily given set I # Ø, the 
following family (P;);eg«q) is projective: If n is the number of elements 


3 The name "first canonical process" is also commonly used. See Meyer (39, p. 53]. 
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of J, then let P, = P Q :-- & P be the n-fold product of P with itself. 
Then, although E need not be Polish, by Theorem 5.4.2 there exists 
exactly one probability measure P; on (E',88) with p;(Pr) = P; for all 
J € HU). 

Thus if the projective family (P;) is of the special form above, that is, 
the associated stochastic process (X;)g; 1s an independent family of 
random variables, then the measurable space (#,%) need not have any 
additional structure. 


For an arbitrary projective family, however, we cannot get along with- 
out additional hypotheses of a topological type on the measurable space 
(E,$), as was shown by E. S. Andersen and B. Jessen [22]. See also Halmos 
[3, p. 214, Exercise 3]. 


PROBLEM 


Prove that the assertions of Theorem 12.1.3 and Corollary 12.1.4 
remain valid if E is a locally compact space countable at infinity and if 38 
is the c-algebra Bo(/) of its Baire sets. 


12.2 PROCESSES WITH SPECIAL PATHS 


We start with a stochastic process (Q,9(, P,(X))igr) with a Polish space 
(E,:B) as state space. Let (Pz) seg be the projective family of its finite- 
dimensional distributions and (#7,%!,Pr,(Y.):er) the canonical process 
associated with this family. Both processes (X,):er and (Y );g; thus have 
the same finite-dimensional distributions, but are different in this respect: 
The set € of all.paths of (X;) is in general a proper subset of the path set 
E! of (Y,). The latter is indeed the largest path set of a process with state 
space E and parameter set T. 


Example 


1. Let Q = IZR, 82R,/138! let P be a d-continuous prob- 
ability measure on Q, and let (R,98!) be the state space of the two follow- 
ing stochastic processes (X*)icr,, (X? )ien,: 


Xf (w) = 0, for all (tw) € Ry X 9 
A ea us Morale Ra, 
1, w=t 
We obviously have X* = X;* P-almost surely for all € R,. Therefore, 
in particular, both processes have the same finite-dimensional distribu- 
tions: For every J € $(R,), Py is the unit mass eo at the point 0 = 
(0, . . . ,0) of R7. (X7) has only the constant path t > 0. The various 
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paths of (X**) are given by the indicator functions on R+ of the one point 
subsets of R,, and thus correspond one-to-one to the points of R}. Hence, 
in this example, the only path of the first process, and no path of the 
second, is continuous. 


Therefore it seems important to refine the question of the last section 
so that we ask not only about the existence of a process with given finite- 
dimensional distributions but also with given path set 2. For a more 
convenient formulation of this question and its answer we introduce the 
following definition. 


12.2.1. Definition. Let (E,33) be a measurable space and (Py) ses ip 
a projective family of probability measures on (£Y,8/). A set Q C E! is 
said to be essential [relative to the family (P;)] if there is a stochastic 
process, with state space E, parameter set J, and finite-dimensional 
distributions P;, such that Q is the set of its paths. 


The construction of the canonical process contained in Corollary 12.1.4 
says just that E! is always essential. We now show the following result of 
J. L. Doob: 


12.2.2. Theorem. A set Q C E! is essential relative to a projective 
family (Px) seg if and only if 


P*(0) =1 (heat) 


holds for the outer measure Př associated with Pr = lim Py. 
«-——— 


Proof. We first assume that € is essential. Then there exists a process 
(Q,%,P,(Xi)rer) with state space E, finite-dimensional distributions Py, 
and path set Q. If we again use the notation introduced after Definition 
12.1.1, then we have ® = X7(Q) and P; = Xi(P) by (12.1.1) and (12.1.2). 
Here the mapping Xz: Q — £' is ?(-33-measurable. Since B is a c-algebra, 
by (1.5.1) we have 

Pt(Q) = inf P(Q). 
QES 
Thus we need to show that Pr(Q) = 1 for all Q € 33! with 9 C Q. But 
& C Q implies that Q = X7'(0) C Xr'(Q) C 9, that is, Q = X71(Q), so 


P(Q) = X:(P)(Q) = P(X7*(Q)) = PQ) = 1. 
Conversely, let P7(®) = 1. Then, as we shall show, the process 
(0,00 BB, (Xie) (12.2.2) 
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solves our problem, where X, is the restriction of the projection Piy to 
Q, that is, 


Xo) = Pilo) = w(t), for all w € Q, (12:29:39) 
and where for every set ČA Q E AO B with Q E 987, we define 
PA Q) = P(Q). (12.2.4) 


Now it is clear that (&, / $87) is a measurable space, that (X iter iS a 
family of measurable mappings of this measurable space into (£,%), and 
that by (12.2.3), the mappings t > X,(w) of I into E for w € € coincide 
with the mappings w € € C E!. Thus we have a stochastic process with 
state space E, parameter set J, and path set € provided that P is a proba- 
bility measure. But this is obvious, provided the Definition 12.2.4 is inde- 
pendent of particular representations of the set ČA Q by means of 
Q EB. Thus if N Qı = ČNA Q for sets Qi, Qo € 987, then we must 
show that Pr(Qi) = Pr(Qs). We can assume without loss of generality 
that Qı C Qz. [Otherwise, we note the relationships Q1 C Qi U Qz, Q: C 
Qi U Qe, and 2M (Q1 U Q) = QA Qı = QU Q»] But Qi C Q» implies 
that ($/Y(Q»N Qi) = £f and thus C. C(Q2N Q); hence, by (12.2.1), 


1 = Pi( €(Q2N Q3) = 1 — Pr(Qe \ Qi). 


If we now take into account the equality Pr(Q» N Q1) = Pr(Qs) — Pr(Q:), 
we obtain P;(Qi) = P;(Q2). Thus it remains to be proved that (P) is the 
family of finite-dimensional distributions of the new process, that is, 
P, = X (P) for all J € $(I). This follows from the equalities 


P(X;(B) = P(Q^p; (B) = Pr(p; (B)) = p,(P)(B) = P;(QD), 
where B is an^arbitrary set in $8837. We need only take (12.1.8) into 
account. J 
Example 


2. Let (£,%) be an arbitrary measurable space and J # Ø an arbitrary 
set. We choose wo € E! arbitrarily. Then for the measures éw, lewo t E I] 
on $8! [Y] defined by the unit mass at wo [wo(t)], we obviously have 


&, = O e,» = lim Q eo. 
tcI — tel 
JEG (1) 


Hence a set Q C Elis essential relative to the projective family 
(® a.) regu if and only if wo lies in €. 
ted 


An essential set & C E! does not generally lie in the o-algebra 387. For 
example, for a topological space E, the set C of all continuous mappings of 
R, into E is an essential subset of ER- relative to the projective family 
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of probability measures which is obtained for some o) € C in the preceding 
example. But C does not generally lie in 88+, as is shown by Corollary 
12.2.4 following the lemma below. This lemma expresses the intrinsically 
interesting fact that every set from the c-algebra $87 is already determined 
by countably many parameters tı, tp . . . € I. 


12.2.3. Lemma. Let (£,%) be a measurable space and I # Ø a set. 
Then for every set B € %! there exists a countable set S C I such that 
for every pair of elements x € E! and y € B, the following implication 
holds: 


z(t) = y(D, for all E€ S. > x c B. (12:25 


Proof. Let 8° denote the system of all sets B C E! for which there 
exists a countable set (depending on B) S C I such that the implication 
(12.2.5) holds for every two elements z € E’, y € B. Then 8° is ac-algebra 
in EH’, Of the properties (1.1.1)-(1.1.3) to be proved, we show the second; 
the proof of the others is obvious. Thus let B € 38* and let S be an appro- 
priate countable subset of I. Then (B lies in Y°. To see this, if z C E! 
and y € CB are elements with z(t) = y(t) for all t € S, then x does not 
lie in B, and hence lies in CB. Otherwise B € $°, x € B would imply 
y € B. For every set J € $(I) and every set Bo € $87, the associated 
J-cylinder p;'(B;) obviously lies in 8° since S = J has the desired prop- 
erties. Hence, 

M. p; (87) C $8. 
Je) 
Since we have a generator of B! on the left-hand side, B! C Y° and thus 
the assertion follows. J i 


12.2.4. Corollary. Let E be a Hausdorff space of at least two points, 
38 a c-algebra in E, and C C ER: the set of all continuous mappings of Ry, 
into E. Then C does not lie in R-. 


Proof. By Lemma 12.2.3, C € S8+ implies the existence of a countable 
set S C R, such that every mapping x € ER-+ which coincides on S with 
a mapping y E C is continuous. By taking the union of S with the set of 
rational numbers from R, if necessary, we can assume that S is dense in 
R,. If we choose a to C R, NS, then 


y(to) = lim y(t) 
tto 


tes 


for every y € C. Since E is Hausdorff and therefore limits in E are 
uniquely determined, every function y € C is uniquely determined by its 
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values on S. Since # contains at least two points, for every y € C we can 
choose a mapping x: R, — E such that z(t) = y(t) for all t € S and 
x(to) ¥ y(to). Now this mapping x cannot be continuous. But this con- 
tradicts the choice of S. J 


We now generally seek conditions which imply the continuity of the 
paths of a process. The following two results of W. Hansen and M. 
Sieveking,* as well as a theorem of Kolmogorov-Prohorov, answer this 
question. 


12.2.5. Lemma. Let (Q9(,P,(X);cg,) be a stochastic process with 
R? as state space and R, as parameter set. The condition below is then 
sufficlent in order that every random variable X, can be modified on a 
null set (depending on /) in such a way that all paths of the process 
(ON PC en) thus obtained are continuous. The condition is: There 
is a countable, dense subset S of R, with the properties: 

(a) For every pair of positive real numbers 7 > 0, k > 0, 


lim P( V (IX; — X,| = 7}) = 0. (12.2.6) 
60 KEF UES 
z T 


(b) For every t € R,, there is a sequence (sn) in S, converging to t, 
such that for all n > 0, 


lim P (|X,, — X,| =n} ^ 05 (12.2.7) 


n—> © 


Proof. For every triple of positive real numbers ô, n, k, let 


A(i,n,k) = XJ (IX. — X z n) 


denote the event occurring in (12.2.6). If e; > 0 is arbitrarily given, then, 
by (12.2.6), for every k € N and for n = 1/k there is a 6, > 0 such that 
P(A (8,,k-1,k)) < Deos For 


A, = XJ Alr k tk), 
k=1 


we thus have P(A,) X «o. Moreover, for every w € QN Ae every k € N, 


! See H. Bauer, Markov Processes, Lecture Notes, Universtiy:of Hamburg (1963). 
5 If we extend Definition 2.11.2 verbatim to measurable mappings with values in 
R?, then (12.2.7) says that (X,,) converges stochastically to X;. 
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and every pair of elements s, t € S (^ [0,k] with |s — t| € à, we have 
X.(w) — Xi) « 7 

For each o € CA., and thus in particular for every o in 


Qo = t (sy Ain, 
n=1 


the mapping s > X,(w) is uniformly continuous on S A [0,o] for arbitrary 
a > 0. Since the Euclidean space R?” is complete, for every w € Qo there 


exists sf 
Xw) = lim X,(w), for arbitrary t € R,, (12.2.8) 


st 


scs 
and, moreover, t > X,(w) is then continuous on R4. Because 6% C Ain 
and 


P( 02) € P(Ay) € à 


for all n € N, € is a P-null set. For every w from this set we define 
Xia) aa CER (12.2.9) 


where a € R? is chosen arbitrarily. Then (Q,2,P, Xen) yields the 
desired result. 

In fact, by (12.2.8) and (12.2.9), and due to the countability of S, each 
of the mappings X,: Q — R? is the pointwise limit of a sequence of (R?,B?)- 
random variables. Hence it follows (when we decompose X, into its p 
real components) that X, is also an (R»,8»)-random variable. Thus 
(Xen; is a stochastic process whose paths are all continuous by con- 
struction. Now we still have to show the almost sure equality of X, and 
X, for every t € R,. By (12.2.8) and (12.2.9) this is obvious for all t € S. 
For arbitrary t € R,, we need condition (b). Accordingly, there is a 
sequence (s,) in S with lim s, = t such that the sequence (X,,) converges 
stochastically to X.. By (12.2.8) and (12.2.9), the sequence (X;,) con- 
verges almost surely, and thus, by Theorem 2.11.4 applied to the p com- 
ponent sequences, it converges stochastically to X,. Since stochastic Jimits 
are almost surely uniquely determined, it follows that X, — X, almost 
surely. | 


Remarks. 1. Itis easy to show that the condition involved in Lemma 
12.2.5 1s also necessary for the existence of a process 
(Q,90, P, (Xr) 


with the given properties. (See Problem 4.) 
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2. Lemma 12.2.5 remains valid when the state space E is a Polish space 
with complete metric d. Then we have to replace |X, — X,| by d(X,,X,). 
Further, we then have to extend some of the results of Section 2.11 on 
stochastic convergence to measurable mappings with values in Æ. 


12.2.6. Theorem.* The following conditions (c) and (d) are sufficient 
for being able to obtain a process (Q3(, P,(X teR,) with the properties 
described in Lemma 12.2.5 from (Q,%,P,(Xi)ier,); to formulate these 
conditions, we set 


Qn(n) = l Sup P(|Xarpz — Xa 


cul, 27... n2^—1 


= n} (12.2.10) 


for arbitrary n > 0 and n EN. 
(c) There exists a sequence (7n)nen Of positive real numbers satisfying 


3 Nn «& © and y A A < ©. CI SEO 
n=l n=1 


(d) For every t € R, there exists a sequence (s,) of dyadic numbers 20 
converging to t such that (X;,)nen converges stochastically to X, [in the 
sense of (12.2.7)]. 


Proof. Let S = {m2-": m, n = 0,1, . . .} be the set of all nonnega- 
tive dyadic numbers. S is a countable, dense subset of R}. We shall show 
that condition (a) of Lemma 12.2.5 is satisfied with this S. Since obviously 
(b) is also satisfied, Lemma 12.2.5 yields the assertion. 

Thus, let n >.0, k > 0, and e > 0 be given. Now we choose ny € N so 
large that no > k and 


oo oo 


Wi € 2 m and n2"q.(q.) « e. 
2 2 


n=no n=no 


For the event 


o  n2^"—1l 


B= U U {|X — Xir 


n=n, i=0 


Z fà]; 


we then have 


œ n2r—1 
PB s ) P PllX«e»r — Xa- 


n=n, 1-0 


IV 


ties 


6 See also Courrége [28] and Neveu [18]. 
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that is, by (12.2.10), 


oo 


BEJE ) n2"gq.(n4) « e. 


n n, 


Now we consider elements s, t € S satisfying s € t € k and t — s € 27". 
Then it will be shown below that one can decompose the interval [s,¢] into 
finitely many intervals of the form 


Ji, = [22-0 + 127] 


with 7, v € {0,1, . . .} in such a way that every v € (0,1, . . .} occurs 
in the representation of at most two intervals J;,, of this decomposition & 
Since every such interval is of length 27" < t — s € 27^, only natural 
numbers v = n, are involved. Since no > k, we also have 


tel St” SEA S oe e pA 


for the numbers 7 and v of each of the intervals J;,, appearing in the 
decomposition. 

The existence of such a decomposition can be seen as follows: There 
obviously exists a uniquely determined interval Jo J;,,, C [s,t] of 
maximal length 27^. If zo < yo are the end points of Jo and s < zo, then 
we determine an interval J;;,,; C [s,t] with xo as right end point and 
maximal length 2-”’. If the left end point 2; of J;,,,,; is still >s, we repeat 
the second step of the construction with x; in place of xo. After finitely 


many steps we have covered [s,yo] with Jo, Jis, . . -a Ji, iy. Here 
obviously vo < v < >=- < vh. Analogously, starting with yo, we obtain 
finitely many intervals Jarne . . . Jis With) < + + + <v}, which 
cover [yot]. The decomposition of [s,t] by Jo, Jian) «o Jio Titan's 


. -o Jir,» thus yields the desired decomposition. 
It now follows by the triangle inequality that 


POL cg 5 [Xar — Xil, 
dv 


if we sum over all pairs (7,v) which correspond to an interval J;,, appearing 
in the decomposition g. Therefore, for every o € ECB, 


|X.) — X«)l £ ) IXacoi(9) — Xato < ) » 
d,» (i,v) 


if we note that v = ny and i < v2” — 1 hold for all admissible (4v). 
Since, furthermore, every v = no occurs at most twice in an admissible 
(iv), we finally have 


IX.) — Xlo) «2 Y n «n 
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Hence, 
Aves d T Rx X) c B 


|s—t|<2-"0 A 
stC€S;stzk 


and P(A) < P(B) < e. Now, condition (a) of Lemma 12.2.5 follows. J 


f. 


12.2.7. Corollary (Kolmogorov-Prohorov Theorem).” Condition 
(e) is sufficient for the existence of a process RAER derived 
from (Q3, P,(X))igg,) with the properties described in Lemma 12.2.5. 
(e) There are real numbers a > 0,6 > 1,c > 0 such that 


E(X, — X49) € c: |s — t (12.2.12) 
holds for all s, t € R,. 


Proof. Bythe Chebyshev-Markov inequality (2.11.1) and by (12.2.12) 
we have, for every n > 0, 


P(|X, — X 2 3) S v*E(X, — XJ») S ev-*ls — i|. 


Therefore by (12.2.10), for all n EN, 
qun Sena. 


Since b > 1, we can choose 6 > 0 such that b — aô > 1. But then, for 
the sequence of positive numbers 7, = 27?, n = 1, 2, . . . , the series 
Dr in and Z; ,n2"q,(5,) are convergent. Indeed, the first is a geometric 
series; the second has as majorant the series c2 ,n2-"09—9—?, which is 
convergent by the ratio test. Therefore condition (c) of Theorem 12.2.6 is 
satisfied. Moreover, for every convergent sequence (s,) of real numbers 
20 with ¢ = lim s, we also have, for arbitrary n > 0, 


PX cus mm cn-*|Sn — tl. 


Hence (X;,) converges stochastically to X,, that is, condition (d) is also 
satisfied. Thus the assertion follows from Theorem 12.2.6. J 


Finally, we show the connection with the question posed at the beginning 
of this section. 


12.2.8. Lemma. Let (2,%,P,(Xi)rer,) be a canonical process, with 
state space R7, from which we can derive a process (Q,3G P. (X ien.) with 
the properties described in Lemma 12.2.5. Then the set C of all continuous 
mappings of R, into R? is essential relative to the family of finite- 
dimensional distributions of (X1):er,. 


7See Prohorov [41]. 
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Proof. Since the original process is canonical, Q = (R”)®+, 9( = (G»)*-, 
and X,(e) = a(t) for all (w,t) E 9 x Ry. By Theorem 12.2.2, we must 
show that P(N) = 0 for every set N € 9( which is disjoint from C. By 
Lemma 12.2.3, for N there is a countable set S C R, such that o € UN, 
à € Q, and w(t) = lt) for.all t € S imply 6 € CN. According to the 
assumed connection between the processes (X;) and (X,), for every t € R} 
there is a P-null set N, € A such that X,(w) = X,(w) for all o E CN.. 
Moreover, t > X,(w) is an element à € C for every w € Q. After these 


preliminaries we conclude as follows: No = V N, is a P-null set. For the 
tes 


& associated with each o € UN we have w(t) = &(t) for arbitrary t € S; 
& lies in C and thus in EN. By the choice of S we now have w € CN. 
Hence CN, C CN, and thus we have shown that N C No and P(N) = 
PON, = 0. 4 


We shall discuss applications of this result in Section 12.6. 


PROBLEMS 


1. Let Q be a set, (l:):er a family of o-algebras in Q, and 9f, the c-algebra 
in Q generated by VJ &;. Prove: For every A C Mo there is a count- 
i€l 


able subset S C J such that A is an element of the c-algebra in Q 


generated by V I. 

2. Let E be a topological space and 8 = B(£) the c-algebra of its Borel 
sets. Then for every set J, the product space E! and the c-algebra 
$8(ET) of its Borel sets are defined. Prove: 

(8) - BP CS (EID). 

(b) 38! = B(L’) for every countable set J. 

(c) For uncountable J, 8! and (E7) do not coincide in general. 
There may exist even open sets G in E! such that G c W. 

3. Let (Q9,P,(X)),er) be a stochastic process with state space (5,93). 
Prove: For each B € $8 there exists a countable set S C 7 such that 
the event 


Adm OG DNO Y EXT ep 
ses 


has probability zero for all t € I. (Hint: Choose sı € I arbitrary. 
Suppose that sı, . . . , Sn € I are chosen. Define 


c,—SUpP(X,CB,...,X, € B, X,c CB} 
tl 
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and choose Sn+ı € I such that 
Ec cu wu xbox c BDBLEO-CYDnn;:. 


Show that S = {s„: n € N} has all required properties.] 

4. Prove that conditions (a) and (b) of Lemma 12.2.5 are also necessary 
for the existence of a process (0,%,P,’ (Xie) with the properties 
mentioned in Lemma 12.2.5. Show that S may be any countable dense 
subset of R}. [Hint: Note that every continuous path t > X,(c) is 
uniformly continuous on each compact interval [0,k].] 

9. Decide whether there exists a stochastic process (X;)gr with R as 
state space such that the family (X,),cr is independent, all X, have 
the same distribution, and all paths of the process are continuous. 


12.3 MARKOV SEMIGROUPS 


We now study an important method for constructing special projective 
families of probability measures and thus for constructing stochastic 
processes. We present a brief consideration of kernels as a preliminary. 

Let (Q;9(), 2 = 1, 2, 3 be three measurable spaces. Let K;, for i = 1, 2 
be a kernel from (Q, M) to (Q;,1,9(;,1). By (10.3.6) we can consider K; as an 
additive, positive-homogeneous, Daniell-continuous mapping of &%,,, into 
&y,. Here &%, again denotes the set of ?(;zmeasurable numerical functions 
= 0 on Q;. Thus we can consider the composed mapping Kı ° K»: Ey, > 

X- This is obviously additive, positive-homogeneous, and Daniell- 
continuous, and thus, by Lemma 10.3.2, is a kernel K; from (Q,9(1) to 
(Q, X). We call K; the composition of Kı with K», and we usually write 
K,K; rather than Ki» K;. By definition, for every function f € 8x, 


[KiKsf](o1) = [Ki(Kef)](w1) = f Ki(wr,dwe) | K2(w2,dws)f(ws) (12.3.1) 
= f [ Klodo) K2(w2,dws)f(ws), 


for every wı € Qı. By choosing an indicator function f = 14,, we then 
obtain the kernel K; = K;K;in the form 


[KiK;](w4s) = f Ki(ede) Ks(o, A5), (12.3.2) 


for arbitrary (w1,43) € Qı X As. From (12.3.2) we can see that KiK; is 
sub-Markov (Markov) if the kernels Kı and K, are sub-Markov (Markov). 
In both of these cases we can interpret Kı, Ko, and KK: as positive linear 
mappings Kı: &, — &,, Ke: 8, > El Kiko: &%, — &, of the vector 
spaces &y, of bounded, real, ?(;-measurable functions on Q;. K,K» is then 
again simply the composition of the first mapping with the second. 

Now the following definition is meaningful: 
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12.3.1. Definition. Let (P:):cr, be a family of kernels on a measur- 
able space (£,$8) indexed by R,.* If 


PuumoP.B, for all s, t € R,, (12.3.3) 


then we call (P:):cr, a semigroup of kernels on E. If in addition all kernels 
P, are sub-Markov (Markov), then the semigroup is said to be sub- 
Markov (Markov). 


Equations (12.3.3) are known as the Chapman-K olmogorov equations. 
By (12.3.2) they say, in more detail, that for arbitrary (x,B) E E X 38 
and s, t E R,, 


P.4:(,B) = J Po(x,dy)Pily,B). (12.3.3") 


Since P,,, = P,P, and Pae = P,P, by (12.3.3), it follows that P,P, = 
P.P,, that is, every semigroup of kernels is commutative. 

In applications we often encounter families (P,):50 of kernels such that 
P,,, = P,P, for alls > 0, t > 0. If we enlarge such a family by the unit 
kernel P, = I on E (see Section 10.3, Example 4), we obtain a semigroup 
of kernels in the above sense. 

We now give a probabilistic interpretation of the new concept. Our 
point of view here is a naive one which, however, will be formally justified 
below. We refer back to the interpretation of a kernel already developed 
in Section 10.3, Example 2. Thus, let (Pj),gg, be a family of Markov 
kernels on (#,8) with P, = J. Then we can interpret P,(x,B) as the 
probability that a particle randomly moving in Æ, which starts at x € E 
at time zero, finds itself in the set B € 38 at time t € R,. (This interpre- 
tation is also justified for ¢ = 0 since P, = J.) Such a particle will have 
no “memory”; its further movement is influenced only by chance and not 
by the “experiences” at times «t. Thus if the particle starting at x finds 
itself in the “volume element" dy at time s, then we compute the prob- 
ability with which we find it in a set B € $8 at time s + t as 


P&B) = J P.(x,dy) P.(y,B). 


But this is just the requirement P,,, = P,P, Thus our interpretation 
automatically leads us to the semigroup property as long as the moving 
particle has no memory. 


Now we finally proceed to examples of semigroups of kernels. 


Examples 


1. Let (#,8) be an arbitrary measurable space and P, = I the unit 
kernel on Æ for all t € R+. Then (P));egg, is a Markov semigroup. 


8 Thus every P,is a kernel from (E,88) to (E,88). 
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In our interpretation, it describes a particle that starts at x € E and 
never leaves this point. 


2. Let (P)igg, be a sub-Markov semigroup on a measurable space 
(E,88). Then for every ^ € R,, (e-P,):cr, is also a sub-Markov semi- 
group. Here of course e^ P, is the kernel 


(x,B) > e-*P.(z,B). 


3. Let (Pj))gg, be a Markov semigroup of kernels on (R?,%?). It is said 
to be franslation-invariant (or spatially homogeneous) if 


P.(z,B) = P(x + 2, B + 2) 
holds for all (z,B) € R? x $85, t € R, and all z € R?. 


'Then we define 
u(B) = P,(0,B) 


and thus obtain a family (u:):cr, of probability measures on $8? such that 
P,(x,B) = m(B — x). The semigroup property for (P;) implies that 


Ms4t(B) = P,400,B) = f P.(0,dy) P.(y,B) 
= fus(dy)u(B — y) = (us * 4)(B), 
that is, 
Mstt = Ms * Me, for all s, t € R} (12.3.4) 


thus (uj)iemg, is a convolution semigroup of probability measures on R?. 
Conversely if we are given such a convolution semigroup (u:):cr, and we 
require the measurability of x — m(B — x) for all t € R, and B E 387, 
then 


P.(z,B) = p(B — x) (12:3.5) 
yields a translation-invariant Markov semigroup (P:):er, of kernels on 
(R?,B?). 

In this special case it is also of interest to determine P,f for Borel- 
measurable functions f = 0. By definition, for every set B € $8» and 
every x € R?, 


(PAg)(r) = Pi(x,B) = (B — x) = [1a (y)u«(dy) 
= fis(z + y)u(dy) = fis(z — y)g(dy). 


Here ji, denotes the image measure S(;) relative to the reflection S(x) = 
— through the origin. When we approximate f by an isotone sequence of 
elementary functions, we obtain (P,f)(x) = Sf — y)ji.(dy). Hence 


Pi qn. (12.3.6) 


In this case, therefore, we call P, a convolution kernel. 
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Further, we are sure to have measurability of x — m(B — x) for a 
given t € R, if m has a density q, relative to the L-B-measure A^, that is, 
if u: = qM. Since 


me 2) = (fen qi dd? = iP q(y — x)d?(dy) 


and due to the 8? @ $3»-measurability of (x,y) — q,(y — x), this follows 
from Theorem 3.2.6. For every Borel-measurable function f 2 0, we then 
write P,f in the form 


Dif =e, (12.3.7) 
where @:(x) = q.(—2z) for all x € R}. 


4. Let (u)icg, be the Brownian convolution semigroup, defined in 
Section 9.3, Example 2, on R?. For every t > 0, we = vor @ `- Q vo, 
with p factors; thus,’ m has the following density g, relative to the L-B- 


measure A?: 
p 
1 1/2 1 p/2 
gc) x lI (=) eco (=) etat (12:73:98) 


i-1 


Thus, by Example 3, we associate with (u;) a translation-invariant Markov 
semigroup (P));en, of kernels on R”. This semigroup is called the Brownian 
semigroup or semigroup of Brownian motion in R?. (Note that g, = g: for 
all t > 0.) 


5. On the real line R, let (r;);gg, be the Poisson convolution semigroup 
on R of Section 9.3, Example 3. For every Borel set B € $5! and every 
x ER, 


oo 


w ty 
a(B— x) = » et aie » ev 15 (x) 
y! y! 


y=0 y=0 


and hence x — vm,(B — x) is Borel-measurable. Thus, by Example 3, (mj) 
is associated with a Markov semigroup (P:):cr, of kernels on R. Since 
m(N U {0}) = 1, 

P,(x,Z) = mL — x) = m. (Z) = 1 


holds, for the set Z of all integers, for all x € Z, and t = 0. Therefore 
(Pier, is à Markov semigroup of kernels on (Z,$(Z)), since Z O $8! = 
P(Z). We call this Markov semigroup on Z the Poisson semigroup on Z. 


° If o1 = simi and oz = Sore (with measures øi, ri, and densities s;), then (21,22) > 
$1(@1)S2(%2) is the density of o1 ® oz relative to 71 @ rs. 
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6. According to the remark at the end of Section 9.4, for every a € ]0,2] 
there exists exactly one Lebesgue continuous measure ue € SI! (R) with 
Fourier transform 


D(a) = ee (x € R). 


Then for arbitrarily given c > 0 and t > 0there is [by (8.1.10)] exactly 
one measure v, € 9X! (R) with Fourier transform 


f(x) = ecce (x € R). 


If we now set vo = eo, then (%):er, is obviously a convolution semigroup 
of probability measures which depends on the parameters a € ]0,2] and 
c > 0. The translation-invariant Markov semigroup (Pj),cg, on R asso- 
ciated with the above convolution semigroup according to Example 3 is 
called the stable symmetric semigroup of order a (with parameter c > 0). 
For a = 2 and c = $ we obtain the Brownian semigroup on R. When 
a = c = l, v, is the Cauchy distribution y: for every t > 0. Therefore in 
this case we call (P,) the Cauchy semigroup on R. Analogs of these semi- 
groups can also be defined in higher dimensions. 


ll 


7. In a Markov semigroup (P,):er,, Po is generally not the unit kernel. 
In fact, if (E,$8,u) is a probability space, then let P, be the kernel P,(z,B) = 
u(B) (x EE, BES, tc R,) independent of x E€ E. Then (P) is 
obviously a Markov semigroup and P, Æ J provided E does not consist 
of only one point. 


We now show that we can arrive at a projective family of probability 
measures via a Markov semigroup (P,):cr, of kernels on a measurable 
space (H,8). The idea for this is suggested by the “randomly moving 
particle without memory." Let ti < -`> « t, be n time points and 
Bı, . . . , B, sets from $. If the particle starts at xo, then we use the 
formula 


iE = ik EY iE Pa, na dzs) Ps, _,—1, a tns, d 1) POP P, (zo,dz1) 


to compute the probability that the particle is found successively in 
By ..., B, attimes&, .. . , tr. If the starting point zo is chosen at 
random via a starting probability, that is, a probability measure u on $8, 
then we have another integration to perform, namely, we have to apply 
the following formula: 


n JU Du t uw i R JB a ra a ate duet) 
++ + Py, (Todri) ndz) = 
Pi ova e if lz x5, (21, UIT A uta ea EO see P, (xo,dx1) n (da). 
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After these preliminaries, we now assert: 


12.3.2. Theorem. On a measurable space (£,98), let (Pi)icr, be a 
Markov semigroup and u be a probability measure. For every set 
J = {t, ... ,tr} EOR with elementst, < i, « * ~*~ « t, and every 
B € H we define 


P;(B) = Ii Sac l O Tux PME Gp) 
vod aa di ooo d ay 


Then (P;);eg(n.; i$ a projective family of probability measures on (E,38"). 


Proof. Every P;is a measure on $7. We need only consider a sequence 
(Bj);zis... of pairwise disjoint sets from $7 and integrate successively 
the equality 15 = 2.115, Ps is a probability measure since obviously 


PED) =) PES bent vacuos oi Dduass abd dad 


Thus we need show only the projectivity of the family (P;). In view of the 
remark after Definition 12.1.2, we need to show that p? (Pj) = Py for any 
two sets J, H € $(R,) for which J C H and H \ J is a singleton. Let 
ti < +--+: < t, be the elements of J and H NJ = (t/]. We have to show 
that p# (Pa) = P,, that is, 


Pa((p;) KO) = Ps(C) 


for all C € BY. Since the sets BX -:- X B, with By, ..., B, EB 

form an -stable generator of $87, then by Theorem 1.5.5 it suffices to 

verify this equality only for sets C= Bı X - - - X B, of this generator. 
Now we assume that i; < t < tı for some i = 1, ... , n — 1. But 

then 

Prop) (0) = Pa(Bio "x Be XEK B a KERB) 


ke jh jn Kn ji Tad es fa Possis dz) (12.3.10) 


00 Peace daiy1)Prs,(aidx') - * + u(dzo) 
E fi jt ick iE Jo, Ken) Panelden) Prlenda’) ` + (dzo), 
where 


i t=n-—1 
fv) = je a ak ike Pi Kaye pas) "S x Peu Cena dies 2); 
quc a 1 
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According to our observations concerning (10.3.4) f is a 8-measurable 
function 20 on Æ. As a consequence of the semigroup property, 


m (Us. IED Puce (2 dz 3) Pv (xi dz") 
= Pri P.v (3a, f) (as) (12.3.11) 
= Brea Ol eae Ft (a) = Um Tre Eee rods 


From (12.3.10) and (12.3.11) we obtain 
Pu((p7)-(C)) 


= i. la wre E E Pss 0i dic) Saar u(dxo) 


= te A Sos jns boss toov) v seda use: 


But this is P;(C) by definition. 
The case /' < tı can be handled in the same way. The case t, < !' is 
trivial. _] 


In particular, if E is a Polish space and $8 is the c-algebra of its Borel 
sets, then Corollary 12.1.4 tells us that the projective family (Pz) just 
constructed is the family of finite-dimensional distributions of a stochastic 
process with state space E. For a given Markov semigroup (P,):er,, the 
family (Pz) now depends only on the starting probability u on $8. If we 
choose the canonical process belonging to (P;), then this is a process of 
the form (Q,39(P^(X)cg,)), where only the probability measure P” 
depends on u (and Q = EF*, A = BF x, = Pi). Then for every single- 
ton J = {t} C R, 


Pq(B) = J P.(xo,B)u(dao) = P#{X, € B], (12.3512) 
for all B € $. In particular, for u = es, x € E, we obtain 


Pla B) = PX, €B) (BED). (12.3.13) 


Our introductory remarks, interpreting P,(z,B) as the probability that a 
particle which starts at x is found in B at time f£, is now fully justified. 
By (12.3.12) we have 


P(X,;€ B} = JPo(z,B)u(dz) (BES). 


Thus if, in particular, Po is the unit kernel, as is the case for many semi- 
groups, then P^(X, € B} = &(B). This also justifies the previously used 
terminology "starting probability." 
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PROBLEMS 


1. 


Prove existence and uniqueness of a Markov semigroup (/7));cg, on 
R»*! such that, for each Borel-measurable function f 2 0 on R?*!, 
we have Ho = unit kernel and 


1 \?/2 à 
Hf (x,7) E (=) Cad I2 (s; zn ds T. — t) dy, 


for all (a,r) € R»*! = R? X R and all t > 0. (Ai)ier, is called the 
semigroup of the heat equation. 

Let (Piicr, and (P/),cr, be sub-Markov semigroups on measurable 
spaces (E,98) and (£/',38^), respectively. Prove existence and unique- 
ness of a sub-Markov semigroup (Q,):cr, on (E X HL’, B & V’) such 
that 


(a) Qf(x,2") = SU fy,y’) P, dy)]P; (2’ dy’) 


holds for all (z,x^ e E X E' and all 8 Q $3'-measurable functions 
f £Z 0 on E X E’. Prove that condition (a) is equivalent to 


(b) €(z, 2) Qi = e, P; e eJ 


[In the sense of (10.3.3), «P, is the measure B — P;(z,B).] Show that 
(Q)) is à Markov semigroup if (P;) and (P/) are Markov semigroups, 
and that Problem 1 is a special case of Problem 2. 

Let (Z,%) be a measurable space, and consider the measurable space 

(H+, Bo) of Section 7.4, Problem 5 where, wo C E. 

(a) Prove: For every sub-Markov kernel K on (#,%) there exists 
exactly one Markov kernel K* on (£*5$89) which extends K, 
that is, which satisfies K*^(r,B) = K(z,B) for all x € E and 
Be. 

(b) Let (P.)ier, be a sub-Markov semigroup on (£,%). Prove that 
(P?*),ceg, is a Markov semigroup on (Ee, 9*0). 

Let (Pi)ier, be a Markov semigroup on a measurable space (H,%). 

A point a € E is called an absorbing point (with respect to the semi- 

group) if 


(a) eo P, = €a; for allt 2 0. 


Prove: Each of the following two conditions is equivalent to this 
definition: 


(by Pia) fa). for allt = 0 and all f € &; 
(e) Pf) 0; for all ¢ = 0 and all f € &% vanishing at a. 
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Here again & denotes the vector space of all bounded, real, 38-meas- 
urable functions on E. Show that the point wo of Problem 3 is absorbing 
with respect to (P?*)er,. 


12.4 MARKOV PROCESSES 


We have not yet justified the conception of a randomly moving particle 
‘‘without memory" which we developed in Theorem 12.3.2 and the sub- 
sequent discussions after the definition of Markov semigroups. This will 
now be done. 

Let (Q,39(,P,(X))gr) be a stochastic process with arbitrary state space 
(5,33) and a totally ordered parameter set I with order relation €. In 
applications we usually have J C R or even I C R,, and X is the usual 
order relation of the real numbers. We therefore interpret s < t as: the 
“time point" s € I lies before the “time point" t € I. We then consider 
(as in Section 11.1, Example 1) the c-algebra 


W SAX; sS D, (12.4.1) 
generated by all random variables X, with s < t, s € I, and call it the 


c-algebra of events up to time t.'? Knowledge of this means information 
about the development or history of the process up to time t. With this 
interpretation, it is natural to refine the idea of a process “without 


memory”’ by the following definition: 


12.4.1. Definition. A stochastic process (X;);g; with totally ordered 
parameter set J has the elementary Markov property if, for every set B € 38 
and every pair s, t € I with s < t, 


PIX E BIN) = P(X, € B| X,}, P-almost surely. (12.4.2) 


In the sense of the interpretation developed above, we thus require that, 
for the “position” of the process at time t > s, the information about the 
development of the process up to time s is equivalent to the information 
about the position of the process at time s. Since condition (12.4.2) is 
automatically satisfied for s = t because of the 2{,-measurability of X., we 
can also allow pairs s, t € J with s € tin the definition. 


Example 


1. If the process (X;),er is an independent family of random variables, 
then it has the elementary Markov property. Indeed, Corollary 5.1.5, the 
c-algebras A, and 3((X;) are independent for any two elements s, t € I 


10 In Section 11.1, X, was denoted by 9(7. 
11 Here P(X, € BJ9(,], and so on, stands for P((X, € B}/%.). 
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with s < t. Therefore by Corollary 10.1.5, for every B € $8 we have 


P(X,c B|9,]) = P{X. E B}, P-almost surely 
and 
P(X,cB|X,) SPX. € Bj, P-almost surely. 


From this the assertion follows. 
In particular, every independent sequence of random variables has the 
elementary Markov property. 


As preparation for our most important example, we give Definition 
12.4.1. in an equivalent form. 


12.4.2. Lemma. The process (X;)gr has the elementary Markov 
property if and only if, for every set B E $8 and any finitely many 
elements 31,2 cece yas, tC T With S s sce Usa e a 


Bie Bie ate! 


= P(X, C B| X,,}, P-almost surely. uu 


Proof. Suppose the process has the elementary Markov property. For 
B E€ $ and elements sı < : - - «s, < t from I we then have 


P{X,€ B|3,] = P(X;, C B|X,], | P-almost surely. 
With the abbreviation A = (X, C B}, this is equivalent to 
E(1a | %.,) = E(14 | X;,), P-almost surely. 
Thus we obtain 


1010218 a f een cS om) hi 
EO XX ot us P-almost surely. 


According to (10.1.18), the left (or right) side is almost surely equal to 
TAGS P CORRER ets [or E(14 | X,,)]. 


But the almost sure equality of these conditional expectations is what 
had to be shown. 

Conversely, suppose the condition of the lemma is satisfied. For B € $ 
and elements s < t from J, we have to show that P{X,€ B | Xs} 
is a version of the conditional probability P(X,C€ B |X}. Now 
P(X, C B| X,} is X,- and thus %,-measurable. We must prove that 


J, PIX: € BI XJ aP DUX neu) (12.4.4) 
for all Q € A.. By (12.4.3), this equality holds forallQ E 3(X,, ... Oe d 
for arbitrary 81, < «e. 8a C watiskying a) < 5: ssp = o3< te The 


system € of all these sets Q is obviously an -stable generator of Ms. 
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Since, on the right- and left-hand sides of (12.4.4), we have finite measures 
on X which coincide on € these measures must coincide on X(C) = 9f, by 
the Uniqueness Theorem 1.5.5. Therefore (12.4.4) holds for all Q C A. J 


The problem formulated in the introduction will now be solved in a 
very satisfactory form by the following theorem. 


12.4.3. Theorem. Let (Q3,P,(X)),cg,) be a stochastic process with 
arbitrary state space (H,8) and R, as parameter set, whose finite- 
dimensional distributions are derived according to (12.3.9) from a Markov 
semigroup (Pj,eg, and a starting probability u on (H,%). Then the 
process has the elementary Markov property. Moreover, for arbitrary 
B € $ and s, t E R} with s < t, 


PÍX, & B]|93G] = P,.(X.,B), P-almost surely.!2 (12.4.5) 


Since, by (12.3.13), we have already been able to interpret P, ,(X,,B) as 
the probability that the particle starting at the point X, at time 0 is 
found in B at time t — s, our idea of a randomly moving particle without 
memory is formally justified by (12.4.5). 


Proof. We begin with the proof of (12.4.5) by noting that P, ,(X,,B)is 
always 3((X,)- and hence 9f,-measurable. Thus we have to show that 


Jo PB) dP = PUX € B} NQ) 


for all Q € 9f, or, by the Uniqueness Theorem 1.5.5, at least for all sets Q 
of the generator 


G = NL EE halo os) 
esi Nl 


already used in the proof of Lemma 12.4.2. Thus we have only to show 
that for arbitrary B € S8andsi« ` +: «s, < t from Ry, 


BOX BPX D XSPLmPI O.B),.. P-almost surely. (12.4.0) 
Then in particular 
PUX; €.B |X, |= Py. (XG, 5), P-almost surely. (12.4.6’) 


Together, the two equalities yield the elementary Markov property. 
Now we give the proof of (12.4.6). Since Y = Pi_s,(X.,,B) is 
3((X5, . . . ,Xs,)-measurable, we must verify that 


ip Y dP-eP(x,e BINA) (12.4.7) 


12 P,_,(X,,B), of course, denotes the random variable w — P, .(X.(»),B). 
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forallQ € 3((X,, . . . ,Xs,). We can take Qin the form Q = (X, € Bi} 
Vine MWS e Bs) (By... « 5B, GS) since: these: sets: form! an 
(\-stable generator of U(X,,, . . . ,Xs,). But then 
[Yar = [Moy oR urs Xe) e Ge, Xe) Yar 
uc MD M OCS, 
where J = (s, . . . ,s,] and Py is the joint distribution of the random 
variables X,, . . . , Xs,- By (12.3.9) we then have 
K Y dP = " Í 255 T Lp elada “OP Geb) Pam, Gunde 


- + + Pa(xodai)u(dzo) 
S IA si "a Pas (B) + + + Puno dzi)u(dao) 


= jig et jm i P edes) © rre dapnidro) 
= Py(Bi X dee C HOD 
where H = (sj, . . . ,8n,t}. Px is the joint distribution of the variables 
Xs e e as X4, Xi and therefore, because of the special form of Q; 
PEB eU XB. X By = PUT CHUTE eS Be Ss 


= P((X. € B] AQ). 
Thus we have arrived at (12.4.7). J 


Property (12.4.5) is more closely connected with Markov semigroups 
than Theorem 12.4.3 indicates. To clarify this, we consider a Markov 
semigroup (P,),cr, on a Polish space E equipped with the c-algebra 
H = P(E) of its Borel sets. After Theorem 12.3.2 we already noted that, for 
every probability measure u on $8, there is a canonical stochastic process 
(Q,9(, P^ (X )ien), with state space E, where only the probability measure 
P" depends on u and where the finite-dimensional distributions of the pro- 
cess are derived in the sense of Theorem 12.3.2 from the semigroup (P;) and 
the starting probability u. Then, in particular, for u = e, (with x € E), 
the probability measure P% on YY is defined, and we henceforth denote 


it. by P*. By (12.3:9)5, forcarbitrary 7j < — Ts 5 un Re sand ‘sets 
Bay i x2 gas eS 
PUR VC By Ju X NIB 

TE fee Icone scdrr users da) 


zs B PEP aen Pe Se styles) Td DEO 
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and hence the mapping 
rz P(X,€B,...,X.€ Ba} 


is always B-measurable. Since each of the processes (2,%,P#,(X.)icr,) 
is canonical; that is, Q = E®+, % = B® and X, = Pty, we see that Y is 
generated by the system 3 of sets (X, C B} O - - - (X, € Ba} = 
AC SOME Ba) oO Guth aot s from RI BS otov 
B, EH, n € N). Moreover, the system M of all sets A for which 
x — P*(A) is measurable is a Dynkin system. Thus € C M and hence 
Y= AL) = S (G&) C W, since G is M-stable. Then x — P*(A) is measur- 
able for all A € N, that is, (xz, A) — P*(A) us a Markov kernel from 
(£,%) to (Q,90). 

Finally, the relationships (12.3.13) and (12.4.5) hold, and accordingly, 
for arbitrary s, t € R4, x € E, and B E 38, 


P,(z,B) = P*(X. € Bj, 
P*(X,.cB|96,) = P,(X,B), P?-almost surely. 
Consequently, 
PAX a E BIA = PIX EB, P=-almost surely. 
Here PX.(A) denotes the random variable w — P*:“)(A). Thus, 


(Q,%, (P*) cx, (Xi)rer,) 


is a Markov process in the sense of the following definition. 


12.4.4. Definition. Let (£,2%) be a measurable space. Then a 
quadruple 


(0,2,(P*) zee, (X)ien) 


is called a Markov process with state space E [or (H,%)] if it has the follow- 
ing properties: 


(Q,9,P7, (Xi)rer,) is a stochastic process with state space E 
for alla € E; (12.4.8) 


x — P(A) is 8-measurable for all A € Y; (12.4.9) 
P*(Xi,c€cB[|9;,] = P (X, © B}, P*-almost surely 
for all s, t € R,, x € E, and B € 3S. 


(12.4.10) 


We call (12.4.10) the weak Markov property. Therefore the processes 
just defined are also called weak Markov processes. We shall not go into 
the definition of the so-called strong Markov property and additional 
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restrictive properties that are often connected with the concept of a 
Markov process. '? 
We combine the above into the following theorem: 


12.4.5. Theorem. For every Markov semigroup (P,):er, on a Polish 
space E there exists a Markov process (Q,9,(P*).cg,(X;)ign,) with state 
space E such that for allt € Ry, z € E, and B € $8(E), 


P,(z,B) = P={X; E E (12.4.11) 
It is noteworthy that the following converse also holds: 


12.4.6. Theorem. Let (£,8) be a measurable space and also let 
(Q,96, (P9) zen, (X):icn,) be a Markov process with state space E. Then 


P(z,B)s P (X. CB) (uB)CEX$9 (12411) 


defines a Markov semigroup (P))en, on (5,88). 


Proof. Each of the mappings P,: E X 8 — R, is a Markov kernel 
on (£,%), since x — P*(A) is 38-measurable for every A € A, and thus, 
in particular, for 


A= (X, = B} zE Xe (By); 


and since B — P,(z,B) is the distribution of the random variable X, 
relative to P* and is thus a probability measure on $8. Thus all we still 
have to show is the semigroup property. Let (z,B) € E X Bands,t E R,. 
Then by the weak Markov property and (10.1.6), 


PGB) = PRX ei = B} zz E*(lix,, en) = E*(E*(lix, en) | 3(,)) 
E*(P*(X, 5€ B.| 9, ]) m B* (PAX LC Bt) 


In view of (12.4.11^, we have 


P.4(t,B) = E«(P(X,B)) = f P.(X.(w),B)P2(dw) 
= [P.(y,B) PX (dy). 


But according to (12.4.11’), the distribution P$, of X, relative to P* is 
the probability measure B — P,(z,B). Thus we finally obtain 


BB) z= J P.(,dy) P.(y,B), 
and hence (12.3.3). J 


13 See Meyer [39], Blumenthal-Getoor [24]. 
14 E» (E+) denotes the expected value relative to P? (Pv). 
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The circle of reasoning will be completed if we can show that the pro- 
jective family (P5), according to Theorem 12.4.6 associated with the 
semigroup (P;) from Theorem 12.3.2 and with u = es, is just the family 
of finite-dimensional distributions of the process (Q,9G, P*,(X)) eg.). 


12.4.7. Corollary. Suppose we have the situation of Theorem 12.4.6. 
Then for every x € E and J € $(R,), the finite-dimensional distribution 
P^ of the process (0,9, P7, ( X) gg,) is given by 


psi ees « PteG oes e) P.L Grand.) CP. ede) 
(B E). (12.4.12) 
Here (4 < * - : < t, are the elements of J. 
Proof. It suffices to prove (12.4.12) for sets B = B1X =- XB, 
with Bj, . . . , B, € B. This is done by induction on n. The case n = 1 


is contained in the definition (12.4.11’) of the semigroup (P,). Thus, sup- 
pose the assertion has been proved for n — 1. Then we can proceed as 
follows: Q = (X,€ Bi} O + > > (VIX, , € Bs) lies in 90, ;; hence, 
by the weak Markov property and (12.4.11), 


P3(By X o X B) = PX, € B) VQ 
= f PX € B. | WJ aP 
= f, Pre Xn- € Ba} dP" 

= f LqP¥1{Xi4,, € Ba} dP? 


Seda Xue Ye, (9 XQ PEU (Eom) dps 


Now we use the transformation theorem and the induction hypothesis, 
according to which the joint distribution of Xa, . . . , X;, relative to 
P* is of the form (12.4.12). 

Then we have 


P3(Bi x sieur S B,) = im CCS i Peo G-pnB.) Du rur ms 
vet Baci da) 
rU MP ee [o NG aha da) rue Pa, da). 


But this is what we wanted to show. J 


388 FURTHER DEVELOPMENT OF PROBABILITY THEORY 


Combining our results, we can establish that Markov semigroups and 
Markov processes can be interpreted as the same mathematical object. 
Experience tells us that the more complicated concept of a Markov process 
provides us with a powerful probability-theoretic tool for studying Markov 
semigroups. 

Because of (12.4.11^), we call the Markov semigroup associated with a 
Markov process the semigroup of transition probabilities of the process. 
By Corollary 12.4.7 we can rederive the measure P" associated with an 
arbitrary starting probability u on $8 from the measures (P*),cg of a 
Markov process. Obviously, 


P» = [f Pzu(dz). (12.4.13) 
We conclude with one of the simplest examples for a Markov process. 


Example 


2. LetQ = EF = R*(p-12,.. ),9| = $8 = B?; let P* be the measure 
e, on A defined by the unit mass at x for every x € E = Q; and letv E E 
be chosen arbitrarily. If we define X,: Q — E by 


Xi(w) = w + et, 


then (Q,%,P?,(X.)ter,) is a stochastic process with state space E whose 
paths are all of the form t — y + vt with y € E. The process thus describes 
the motion of a particle starting at x with constant (directed) velocity v. 
Since every X, is a translation in Æ, we have A(X.) = $8» = Y and hence 
X, = 9( for all t € R,. 

Then (Q,96,(P7),eg,(Xi)icg,) is a Markov prosess. Indeed, for every 
A E A, we see that x — P*(A) = 14(x) is B-measurable. Since ( X, € B] = 
B — vt and X, = A for allt € R, and B € &, we also immediately obtain 
the weak Markov property: On the one hand, we have 


PAX EBTI = Pe kage € Bl = Lla-uso(2) P*-almost surely ; 
on the other hand, for all w € Q, 


PX) {X, € Bj = In-u(e + vs) = le xerlo) 
and thus 
PAX C Bios sexo P*-a]most surely 


(LC R,xcE, BEM). 
The associated Markov semigroup (P,):er, is given by 


P,(z,B) = 1s (x) = e(B — wt). 
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PROBLEMS 


1: 


Let (E,98) be a measurable space, and let (Q,9f, (P*)es(Xi)icn,) be a 
quadruple with properties (12.4.8) and (12.4.9). Prove that the weak 
Markov property is equivalent to the following property: 


PDAS CESBIAXu ons Ar} = PXe[X.2a € B], P*almost surely. 


tn 


foralz GC H,OSti< +--+: «t«t,andall E $8. [Hint: Imitate 
the proof of Lemma 12.4.2.] 
Consider the following subset of R?: 


2 = {0} X R, U]0,+[ x {0}. 


Let A be the trace in Q of the Borel c-algebra 82, and define for each 
x C R, the following probability measure P” on A: for x > 0, 
raaa A fore dq i 0. PCA) = [14(0,y)e7v dy. Furthermore, for 
t € R,, we define X,: Q — R, as follows: 


X,(2,0) =az-+t (x = 0), 
31430, i4 sy 


Use the result of Problem 1 and prove that (0,%,(P").cr,, (Xvier,) 

is a Markov process with R, as state space. Determine the semigroup 

(P4) of transition probabilities, and prove in particular P°{X, = 0} = 

et. (This Markov process describes the movement of a particle in Ry 

with constant velocity 1 but with the additional property that it 
leaves the origin 0 only after an “exponential holding time.") 

Let (0,2,(P*)zcx,(Xi)ter,) be a Markov process with a measurable 

space (H,%) as state space. A set A of the c-algebra X is called 

absorbing if P*( X, € A] = 1 forall x € A and allt € R,, that is, if 
the process starting at any x € A does not leave A P*-almost surely. 

In particular, a point a € E is called absorbing (with respect to the 

process) if {a} € $8 and {a} is an absorbing set. 

(a) Prove: A point a € E with {a} € $88 is absorbing with respect 
to a Markov process if and only if a is an absorbing point with 
respect to the semigroup (P;) of transition probabilities (see 
Section 12.3, Problem 4). 

(b) Let (Q,96,(P*),cgm»4,(X)icg,) be a Markov process with the 
semigroup (H,) of the heat equation (Section 12.3, Problem 1) 
as semigroup of transition probabilities. For each to € R, denote 
by A,, the closed half space 


An = {(a,r) € R? X Rit € to} 
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of R»*! = R? x R. Prove that A, is an absorbing set with 
respect to this Markov process. 

4. Let E bea locally compact space with countable base, E’ = E U {wo} 
its one point compactification, and let (P,)ıer, be a sub-Markov 
semigroup of kernels on (£,$8(E)). Prove the existence of a Markov 
process (0,%1,(P*)zcx’,(Xiier,) with (£',38(E')) as state space and 
with the following properties: (a) P,(z,B) = P*(X, € B} for all 
x € E and B € $8(E); (b) wo is an absorbing point. [H?nt: Use 
Section 12.3, Problem 3.] 


12.5 PROCESSES WITH STATIONARY 
AND INDEPENDENT INCREMENTS 


In Section 12.3, Example 3, we became acquainted with the concept 
of a translation-invariant Markov semigroup (Pj);gg, on R». If 
(Q,9(, (P*)cere, (X1)ter,) is a Markov process with R? as state space and 
(P.)ter, as semigroup of its transition probabilities, then (12.4.11) tells 
us that (P,) is translation-invariant if and only if 


P=+{X, € Bz} = P(X, B) (12.5.1) 


for all x, z € R?, B € 387, and t € Ry. Naturally, this property is said 
to define the class of translation-invariant Markov processes in R?. Two 
of its properties will be stressed here. Thus we define: 


12.5.1. Definition. Let (0,2,P,(X:):cr,) be a stochastic process with 
R? as state space. 


bs 


(a) We call (Xj)ign, a process with stationary increments if there exists 
a family (u:):er, of probability measures on $8? such that 

Du oy. = Mass (12.5.2) 
for arbitrary s, t € R, with s < (.!5 


(b) We call (Xj);en, a process with independent increments if for every 
finitely many io, à, ..., 4, GR with 0=%t% e A <%,, the 
random variables 


AGE Xt, = Xu ACA Xai, je" AO 
are independent. 


15 In somewhat unprecise formulation this means that the distribution Px, x, for 
0 Ss St < +% depends only on the difference t — s. Obviously, we always have 
Ho = éo. 


STOCHASTIC PROCESSES 391 


These concepts are of interest first of all because of the following two 
theorems: 


12.5.2. Theorem. Let (Q,365(P-),cg,,(X)cm,) be a translation- 
invariant Markov process with state space R? and a semigroup (P,):cr, 
of transition probabilities. Then for every x € R?, (Q9, P*,(X))eg,) is a 
process with stationary increments. For arbitrary s, t € R, with s < t, 
the probability measure B — P, ,(0,B) on $8» is the distribution of 
X, — X, relative to P”. 


Proof. As in Section 12.3, Example 3, we denote the probability 
measure B — P,(0,B) by m (t € R,). Then we have to show that 


Ed -= 
PX. x, = Pa-s 


for arbitrary s, t € R, with s € t. The cases = tis disposed of by means 
of Corollary 9.3.7: According to Section 12.3, Example 3, (uj)icm, is a 
convolution semigroup of probability measures and thus po = eo = 
Prex. 6 When s < t, we set Y = X, & X, and let d: R^ X R^ — R? 
denote the continuous and thus Borel-measurable mapping (z1,2;) > 
z2 — zi. Then P$, x, is the distribution of do Y relative to P". Therefore, 


for every B € $397, we have 
P(X; — X: € B} = P{doY c B] = P*{Y cd *(B)], 
and thus, by (12.4.12), 


[f lae (1,22) Ps (21,da2) P (x,dz1) 
J Pan + B) P,(z,dz1) 
P,_.(0,B) [ P.(z,dz) = (B). 


P:(X, — X, € B} 


Here the translation-invariance of P,_;, that is, the equality 
Jea HEt AF B) = Iu B) 


was used for all zi € R”. J 


12.5.3. Theorem. Let (0,%,(P*)zcre,(Xiier,) be a translation- 
invariant Markov process with R? as state space. Then for every x € R?, 


(Q9, P, (X )ien,) 


is a process with independent increments. 


16 Thus Py is the unit kernel 7 in every translation-invariant Markov semigroup 
(Pier, on (R?,B?). (However, see Section 12.3, Example 7.) 
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Proof. Let (P)igg, be the Markov semigroup associated with the 
given Markov process. This is translation-invariant and thus P,(z,B) = 
P(x + z, B + 2), that is, 


fis(y)P(z,dy) = Slese(y) Pile + zdy) = f1s(y — 2) Pile + z,dy). 
By Theorem 2.3.6 this is equivalent to 
[fa)P(zdy) = [fy — P(x + zdy) — (xz € R?) 


for all 8?-measurable functions f which are 20 or bounded. The inde- 
pendence of 


Yom ko lie Xo e EL E Em Aq G 


n-1) 


to be proved (where 0 = to < ti < >> > < tn) is equivalent, by Theorems 
5.2.4, 8.1.6, and the Uniqueness Theorem 8.2.4 for Fourier transforms, 
to the validity of 


PY.@... 9Y, (Yo CES Va) = gy, (Yo) ` GE ' ey, (Yn) 
(Ae, seco I ies R2), 


where ex denotes the characteristic function of the random variable X. 
Thus we have to show that 


n 
4$ «yjYj; fe 
E*(e j=0 ) = IT Et (et<4i¥ i>) 
j=0 


for all z, yo, . . . , Yn € R*. Here E” denotes the expected value relative 
to P*. By (12.4.12), we know the joint distribution of the random variables 


"c ES 


n 


relative to P*. Therefore, 
E*(ezewY;) = Be (et®<uj Xu-Xy1>) 
=f be [e iti 7t? Prt, (audes) - + + Pues), 


where we agree that X,, = 0 and 2.1 = 0 (because of the summation 
from j = 0 toy = n). In the first integration we have to compute 


i = [eim TA C 1,025), 
and thus, according to the preliminaries, to compute 
Gn = Je? P, (0,de,). 
Since, by Theorem 12.5.2, B — P,_, (0,B) is the distribution of 


n n=l 


relative to P?, we obtain 
qus E*(eicv»Ys7). 
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But this expression is independent of zı, . . . , 2, 3, so that the next 
integration can be carried out analogously, and we finally obtain the 
desired equality. J 


Remark. Of course the random variables X; and X, corresponding to 
times s < t of a translation-invariant Markov process with state space R” 
are in general not independent relative to P*. Independence would mean 
Prox, = Px, 8 PX, that is, 


J| 19629 Pide) Pads) = Py, & Ps.) 
for all B € 38» & B?, and thus 
ie P,-.(21,B2)P.(x,de1) = Pe B) BaBa 


for all Bı, B: € 38». We can easily verify that this equality fails, for 
example, for a Markov process corresponding to a Poisson or Brownian 
semigroup 


We discuss another interesting connection with the convolution semi- 
group of probability measures considered in Section 9.3 and thus with 
infinitely divisible distributions. 


12.5.4. Theorem. For every stochastic process (Q,3(, P,(X))ieg,) with 
stationary and independent increments (and with R? as state space), 
(Px,-x,)ter, is a convolution semigroup of probability measures on R?. 
Conversely, for every convolution semigroup (m):cr, of probability 
measures on R», there exists a process (Q,3(,P,(X))gg,) with stationary 
and independent increments such that Px, x, = m— for all s, t E Ry 
withs St ~ 


Proof. Let (Xi):er, be a process with stationary and independent 
increments and m = Px, x,. For every two numbers s, t € Ry, Xs — X, 
and X, — X, are then independent random variables, and thus, by 
Theorem 5.3.4, 


Mst+t = Px) =, = Pepe, * Px,_x, = Ms * Me. 


Conversely, let (u:).cr, be a convolution semigroup of probability mea- 
sures on R?. We shall construct the desired stochastic process with 
the help of Kolmogorov’s Theorem 12.1.3 and therefore begin with the 
construction of a projective family (P;);&g(a. We use the abbreviation 


E = R?. For every set J € $(R,) with elements ti < -~ >_< tn, we 
define the following linear (and therefore also continuous) mapping T'; — 
EJ — E. For every point (zi, . . . ,tn) € EY, let 


day APUD) us ro xy oa ere seg dS. roo deg). 
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Then Ty is bijective. For the inverse mapping we obviously have 


TEC n vu tn) = (x1, qoo 15 es eas Sin. d aet). 
If we set 
Py = Ty (us, Q pu, Bins Meg i) (12.5.3) 


then (Pj)regi, is a family of probability measures on the spaces EY. 
To verify the projectivity, let H be another set from $(R,) such that 
H NJ contains exactly one element /'. Then, according to Remark 1 of 
Section 12.1 we need only verify that pZ(Py) = Pz." We prove this for 
the case t£, 1 < t’ < tan. The remaining cases can be taken care of analo- 
gously (only with somewhat more writing) ; the case £, < ¢’ is even trivial. 18 


For every point (xi, . . . An DE En) € EB, 
H 
Dried go o aco A que) 
= (23, qnaa ERU ri I E T + tn + 2’ + Xn). 


For every Borel-measurable function f 2 0 on EY we therefore have, when 
we take into account the semigroup property of (u:), 


ffo pio T 1d (ue, COR SECO Mt'—U e hee) 
= ff(ar, . 2. tit db mad x F m), e(dzs)no, (da) :*- 
pu (da1) 
E IC dec tn Eos ct ci SEU e n c CS 
pa, (d23) 
-jfffzu...,21::: t tats bo > dc omnneuudm)cc- 
be, (dx) 
= [fdP;. 
Consequently, p} (Pu) = pho Tulu, 8 >: @ m,-v) = Py, and thus 


the family (Pz) is indeed projective. 

Let (Q,39(,P,(X))igg,) be a stochastic process (which exists by Theorem 
12.1.3) whose family of finite-dimensional distributions is just (Pz): Then 
for every set J € $(R,) with elements ti < * * < th, 


Px,e...ex, = P;. 


By using T;', we now obtain (because of the transitivity of image 
measures) 


P'x, acu xe e(n sete 
=f (P = me, @ py, @ 9 Oi, (12.5.4) 


If we now apply the natural projections of EY on the n factors E, it follows 
that 


I ys = Ht e ee => Mty—ty + ..3 PET = Bi He xe (122535) 


n-1 
A 


17 p? again denotes the natural projection of EZ onto EY [see (12.1.3)]. 


18 See also the proof of Theorem 12.3.2, which is carried out similarly. 
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The equalities (12.5.5) tell us that we are dealing with a process with 
stationary increments. We need only take account of Corollary 9.3.7, 
whereby uo = eo, and thus Px, x, = ms also holds for t = s. By Theorem 
5.2.4, the independence of the increments follows from (12.5.4) and 
(12.5.5) 9 


Finally, real-valued stochastic processes with independent increments 
provide us with further important examples for martingales. We have: 


12.5.5. Theorem. Let (0,%,P,(X,):er,) be a stochastic process with 
independent increments and the real line R as state space. If all the 
random variables X, are integrable, then 


E(X, | %.) = E(X, — X) + X, P-almost surely (12.5.6) 
for arbitrary s, t € R, with s < t. 
Proof. Asin the proof of Lemma 12.4.2, we only have to prove 


E(X,| X ...,X,) = E(X: — X) + Xs, almost surely 
for arbitrary 4e . .. ,4, € Re with 0 =i <--> «t«t. But now 
HOG XOT D AE) 
= E(X, | KN I 2498 ^ i O LO FE 242 
— E(X, — X, | X Xi, x A 2 FD X. E P o 
d 5E O8 Cog ge; CRISIS. (ol 


Hence the assertion follows, by Corollary 10.1.5 and by (10.1.7). J 


This proves that the stochastic process from Theorem 12.5.5 is a super- 
(or sub-) martingale if and only if 


EX) cox uas Os (oris 0); 


for arbitrary numbers s, t € R, with s < t. Moreover, if the process is 
stationary and (u;)ien, is the associated convolution semigroup according 
to Theorem 12.5.4, then 


E(X: — X.) = Jam (dx) ^ (st € Ros < t). 
PROBLEMS 


1. Let (Q,96,(P*),cg,(X):en,) be a Markov process associated with the 
Brownian semigroup on R (in the sense of Theorem 12.4.5). Prove 
that each of the stochastic processes (2,%,P*,(Xi):cr,) with x € Risa 
martingale. 

2. Let (Q,96,(P9),cz,(X):eg,) be a Markov process associated with the 
Poisson semigroup on Z. Prove that each of the stochastic processes 
(Q,9G, P, (X), cg,) with z € Z is a sub-martingale. 
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12. THE BROWNIAN MOTION PROCESS 


According to Theorem 12.4.5, there is a Markov process associated with 
the Brownian semigroup on R?” (Section 12.3, Example 4). This section is 
devoted to a careful study of this process, and in particular, of the behav- 
ior of its paths. 


12.6.1. Definition. Brownian motion process (or Brownian motion) 
in R? is called any Markov process (Q,9,(P7);cg»,(Xi)igg,) with state 
space R? whose semigroup of transition probabilities is the Brownian 
semigroup on R? and whose paths are all continuous. 


The existence of such a process is given by the following theorem. 


12.6.2. Theorem. There exists a Brownian motion process 
(Q,9G, (P*)cere,(Xi)ter,). In particular, Q can be chosen equal to the set 
C C (R7»)R* of all continuous mappings of R, into R?” and Y equal to the 
trace of (88»)R« in C. 


Proof. The existence of a Markov process (0,%,(P*)zep»,(X1)ier,) 
with the Brownian semigroup (P,) as semigroup of transition probabilities 
is given by Theorem 12.4.5. Since this is a combination of Theorems 12.3.2 
and 12.4.3, we can assume that each of the processes (Q,%, P?,(X1):er,) is 
canonical, and thus in particular that Q = (R”)®* and A = ($8»)R* and 
that every P? is the projective limit of the projective family (P4)ses;ry 
given by (12.4.12). We have to show that C is essential relative to this 
family (P3). By Lemma 12.2.8, a verification of.conditions (c) and (d) of 
Theorem 12.2.6 for the process (Q,9(,P*,(X)),cg,) will suffice. But this 
can be done as follows: 

Since the Brownian semigroup is translation-invariant, by Theorem 
12.5.2 we know the distribution of X, — X; relative to P” for all s, t € R} 
with s « t. It is 

P ox. QM, (12.6.1) 
where g:—s is the density from (12.3.8). Therefore, for arbitrary t € R,, 
n > 0, and 6> 0, 


P{|Xus — Xi 2 n} = f, n) (da), 
where 
4, 


{2 © Re: |2| z& 9). 
Now obviously 
D 


AE (z ER: lz;| = 


1-1 


STOCHASTIC PROCESSES 397 


» i NE 


2p(2c8)-1/? mf eem qe 
n/N/p 
2p(2v 8)- 1/2 ESTNE e 87/28 dé. 
vo à 


A primitive of the last integrand is £ ~ —e-*/09», Thus we finally obtain 
the inequality 


and thus 


Ey X ave = 9] 


IIA 


IIA 


2p6 
Pax as Xa nies AER E eT itpb, (12.6.2) 
n T 


Since lim Vô e-/?»» = 0, we have lim P*(|Xi; — X,| 2 »1 = 0 for 
6-0 6-0 

every 7 > 0, and hence condition (d) holds. From (12.6.2) we now obtain 

a bound for the number qn(n) defined in (12.2.10): 


2 
a(n) x * NES Q—nl2e—n"2"/2p, (12.6.3) 
n T 


Thus if we choose m = 2 Uim = 1, 2, . . .), then Zz. ,", « © and 
= Qni2z 
n2"qn(nn) S const. 1201 02g c op, 
nl 
as we see from a simple application of the root test. Thus condition (c) 
has also been verified. J 


Remarks. 1. In the above proof we could have used Theorem 12.2.7 
of Kolmogorov-Prohorov instead of Theorem 12.2.6. We show this for 
the case p — 1 and leave the somewhat more complicated computations 
for p > 1 to the reader: For every pair s, t € R, with s < t, we know 
the distribution Pt, y, by (12.6.1). Therefore, 

+ 
BAX, — X49 = Qr — 9) [7 vemm ay 
t—s)? ft 
= ( 9 Í zte”! dz. 
V or — o 


But now (2z)-'? f *Z z*e-?!* dz is the fourth moment of the standard 
normal distribution voọı and thus, by Section 8.3, Example 4, equals 3. 
Hence, 


BAX — X,|) = 3(t — s)2. 
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Therefore condition (e) of Theorem 12.2.7 is satisfied, namely, with 
a = 4, c = 3, and b = 2. 


2. For a Brownian motion process (2,%,(P*)zcre,(Xiier,), the measur- 
able space (Q,9() is not uniquely determined. For example, we can replace 
Q by Q = QU Qs, where Qo is an arbitrary set disjoint from Q, choose for 
Ñ the c-algebra of all sets A or A U Q with A € 9( generated by A and 
Qo in Q, and set P*(A) = P*(A UJ Ro) = P*(A) for every A € A and 
x € R? and define 


Xlo), wE 


Xu) = 0 AE 


(t E R4). 


'Then (Qr E CER) is obviously again a Brownian motion 
process. Nevertheless, we speak of the Brownian motion process, since the 
same family of finite-dimensional distributions is associated with each 


x E R. 


3. If we choose 2 = C and A = CC) (B?)F*, then Pis called the 
Wiener measure associated with the starting point x c R? 


The name “‘Brownian motion process” is justified by the fact that this 
Markov process provides us with a far-reaching, convenient mathematical 
model for the Brownian motion already discussed in Section 12.1. This 
type of description of Brownian motion is based on results of Einstein 
and Smoluchowski. If we interpret Brownian motion as a stochastic 
process (X;)ien,, then we can first of all assume that we are dealing with a 
process with stationary and independent increments. Secondly, we can 
consider the increment X, — X,as the result of many irregular successions 
of molecular collisions. The Central Limit Theorem therefore makes it 
natural to consider X, — X, as normally distributed. Theoretical con- 
siderations led Einstein to determine the variance of this normal distribu- 
tion. It turned out to be 2D?(t — s), where D is the diffusion constant of 
fluid. It is no loss of generality to set D = 1. 

A theorem of N. Wiener, by which P*-almost surely every Brownian 
path t — X.(w) is nowhere differentiable (that is, the Brownian particle 
has no velocity), shows that this model is not sufficient, from the physi- 
cists’ viewpoint, to describe all phenomena of Brownian motion. We 
shall be content with a weaker result: 


12.6.3. Theorem. Let (0,%,(P*)zcr>,(Xi)ier,) be the Brownian 
motion process. Then, for every x € R? P*-almost surely, every path 
t— X,(w) is differentiable only at the points of a Borel set D, C Ry of 
Lebesgue measure zero. 
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Proof. The lemma below shows that the mapping (w,t) > X,(w) is 


A & Bi-Br-measurable (Vå = R,/^$8?). Therefore, if Xl, .. . , X? 
denote the p coordinates of X;, the mappings 


(wt) > Ds Xi(o) 


III 


, [s or 
lim sup 7 [X/,4(e) — X;()], 
h0 h 
h rational : 
1. | G 
lim inf -[X;,,(e) — X;()] 
h0 h 
h rational 
of Q X R, into R are all X & $81 -measurable. Hence the set D! of all (ct) 
for which 7 — X} (w) is differentiable at the point t C R, lies in A & Bi. 


(wt) > D-Xi(v) 


Pp 
Consequently, D = (/ Di, the set of all (w,t) for which 7 > X,(w) is 
i-1 
differentiable at t € R,, also lies in X & Bi. Thus by Lemma 3.2.1 every 
w-section D, (t-section D,) of D lies in Y} (90. Fubini’s Theorem applied 
to the indicator function of D [or (3.2.3)] then yields, for every x € R», 


E: ( ik TARY dt) = n * P*(D)) dt. 
Therefore, if P*(D;) = 0 for all t € R,, then we have 
AMD.) = "n y p(w,t) dt = 0, P*-almost surely 


and thus the assertion. But we can see that P7(D,) = 0 always holds, as 
follows: D, is the set of all w € Q for which the path 7 — X,(w) is dif- 
ferentiable at t. The boundedness of (1/h)[Xiuin(w) — Xi(w)] as h > 0 
(h > 0) is necessary for the differentiability of 7 — X,(w) at t. Therefore, 
if we define 


1 
Aj = ls [Xun — X| S M 
for arbitrary numbers h > 0, M > 0, then 


D,c VJ im int ds 


M=1 mw 


and it suffices to show that P*{lim inf Ajj,,} = 0 for every M > 0. By 


n—> æ 


Fatou's Lemma, the following is thus sufficient :1° 


lini PAAS) = 0; for every M > 0. (12.6.4) 


h-0 


19 If (Bn) is a sequence of subsets of a set and (1z,) is the corresponding sequence of 


indicator functions, then lim inf 15, is the indicator function of lim inf B, = No YN Bie 


N> © 2199 m-ln-m 
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By Theorem 12.5.2 for every h > 0, the distribution of (1/h) (Xin — X.) 
relative to P? has y — h?g,(hy) as density relative to ^». Hence, 


h p/2 h »/2 
< (.) f v(dy) < (=) (241, 
2m luis 2m 


and (12.6.4) obviously follows. J 


P«(AM) = he / gs (hy od) 
lyl S M 


We still need: 


12.6.4. Lemma. Let (Q90,P,(X)),eg,) be a stochastic process with 
R? as state space such that all paths are right continuous. Then (w,t) > 
X,(w) is X & $81 -87-measurable. 


Proof. For every n € N, we define Y,: Q X R? — R? by 


Yn(w,t) = Xg+nızlw), provided 92h Sata G + 1272 
GESOT 


Due to the right continuity of all paths, lim Y,„(w,t) = X,(w) for all (o,t). 


Since we can pass to the p coordinate functions and since Theorem 2.1.5 
holds, we need prove only the $81-measurability of Y,. But for every 
Deme 


Y,'(B) = V (X qiias € B) X [j22*, GF 2S tO oo 
js 


The Brownian motion process has found many applications in mathe- 
maties during the last decades. We mention only the many connections 
with potential theory and the so-called invariance principles. The interested 
reader is referred to the special literature in these areas; above all, to 
Blumenthal-Getoor [24], Meyer [39], and Prohorov [41]. 


PROBLEMS 


1. Let (Q9L(P9)).cg,(X)ieg,)) be the Brownian motion process in R?. 
Prove: For each x € R7, and any four real numbers 0 € to < tı € 
te < t3, one has 

HAC xy. = Xs, Xt, E Aor) = 0, 


20 The lemma remains valid if R?” is replaced by a Polish space E. Then we have to 
check that the pointwise limit of a sequence of E-valued random variables is itself 
again a random variable. 
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that is, for each x € R?, the stochastic process (2,4, P*,(X))eg.) has 
orthogonal increments. 

2. Let (Q9,(P*)).eg,(X).en,) be the Brownian motion process in R, and 
choose a € R and T > 0 arbitrarily. For every finite subdivision 
t= (fo, ...,4 of the interval: [0, T]; that. is; 0.9 ty «& - < 
t, = T, define 


é(f) = max ( — t5? —1,...,m). 


Vid) = » 1X, — Xal, 


(=i 
V(t) = y BE G E 
Sf 
Prove: 
E Vo) En —s Xe Isis m Mens oe O 


(b) lim E*(|V;(t) — (b — a)|?) = 0; 
à(t)-—50 

(ce) sup Vi(t) = +2, P*-almost surely, 
t 


where the supremum is taken over all finite subdivisions ¢ of [0,T]. 
Hence the paths of the Brownian motion process P?-almost surely 
are not of bounded variation on [0,7]. 


12.7 THE POISSON PROCESS 


In formal analogy to the treatment of the Brownian motion processes, 
we now study the Markov processes associated with the Poisson semi- 
group (Section 12.3, Example 5), and in particular their paths. 

We first agree on a useful convention: Let f: Ry — R be an isotone real 
function on R,. Then the following limits exist for all t € R, by the 
monotonicity criterion: 


limf(s, ift>0 


ff =jimf(s) | and fo =) sa 
st f(0), if t = 0. 


Then we call the difference 


&-—ft—f 
the jump of f at t E R4. We say that f has (only) jumps of size 1 if, for all 
t € R,, either s, = 1 or s, = 0. 


12.7.1. Definition. Poisson process is called any Markov process 
(Q,96,(P9).ez,(X)icn,) with the set Z of integers as state space, whose 
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semigroup of transition probabilities is the Poisson semigroup on Z and 
whose paths are all right continuous isotone functions f: Ry — Z with 
jumps of size 1. 


In analogy to Theorem 12.6.2, we now have: 


12.7.2. Theorem. There is a Poisson process (Q,9G,(P7).c7,(X?)ien.). 
In particular, 2 can be chosen equal to the set W C Z** of all right con- 
tinuous isotone mappings of R, into Z with jumps of size 1, and % equal 
to the trace of B(Z)®+ in W. 


Proof. The idea of the proof is analogous to that of the proof of 
Theorem 12.6.2. By Theorem 12.4.5 there exists a Markov process 
(Q,96,(P7).c7, (X)ien,) with Z as state space and the Poisson semigroup 
(P:)ter, as semigroup of transition probabilities. Here we can again 
assume that each of the processes (0,1, P*, ( X),eg,) is canonical, and thus 
in particular Q = Z**, % = B(Z)®+, and P? = uum P^, where (P7) is the 


projective family of probability measures TRE with (P) anda € Z 
in the sense of (12.4.12). The theorem is proved if W is essential relative 
to (P7) for every x € Z. The verification will be accomplished by a modi- 
fication of the reasoning used for Lemma 12.2.5 and Theorem 12.2.6. Thus, 
we let (m):ex, denote the Poisson convolution semigroup of probability 
measures on R. Then, by Theorem 12.5.2, PX, x, = m... for arbitrary 
x EZ and s, t E R, with s € t. Let S denote the set of all dyadie num- 
bers = 0. For arbitrary given x € Z, we then show the following: 


1. There is a set Q, C A with P*(Q;) = 1 stich that 
Xs(w) € Xi(w) 3, t E S, s S t; w EQ). (12.7.1) 
Indeed, for arbitrary s, t € R, with s € t, we have 
Pa Xem— X20 0) me -Lb-—2,. 2 299 


that is, X, € X, P*-almost surely. Hence, the existence of Qı follows, 
since the set of all pairs (s,t) € S X S with s € t is countable. 
2. There is a set Q: E Y such that Q C Qı, P*(Q2) = 1 and such that 


o n2^—1 


Qe C (Y NP ms {X c+- ax X (s—-1)2-" ES ie (12:02) 
m n=m i=l 
In order to prove this we show that each of the events 
o n2»—i S 


=(\ Rees [X Grps-^ — Xa- penr Z 2] (m € N) 


n=m i-l 
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and hence the event V Bn has probability zero. Then, obviously, 


Q: = Q \ i Bey £A f^ CB,, has the desired properties. But 


m=1 
PAB =O (m E N) (12 7:8) 
can be seen as follows: For all m, n € N with n = m, we have 
n2" —1 
PE qu ( Y {Xanza — Xu-yzr = 21) 
n2^n—1 E 
= Ba Kee — XQG-ps 2 2} 
mu 
n2^—]1 œ% 
2—(n—1)k 
= ` or E o o 4) = (n2" — Det" » xi 
i=l k-2 j 
= 2—(n—1)k 


(n2” E 1)e-? ""'2-2(—D E (n2” pt DorteD 


(k + 2)! 
=0 


Since lim (n27 — 1)2-2@—-) = Q, this implies P*(B,) = 0. 
3. We have 


Pox x, Pet Eds. (19.74) 


p= 
For arbitrary s, t with 0 < s < t, this follows from 


: t= s) 
P*(X,— X, 2 1) Ea N) e-t-? > ( = ) : 
: k=l 

4. By 1,s — X,(w) is an isotone mapping of S into Z for every w € Qı 
and thus for every w € Qs. Therefore, for every w € Qə, there exists 
Xo him XS) for arbitrary t € R,.?! (12:755) 
sS t 
Hence, by 1, t > X,(w) is a right continuous isotone mapping of Ry. into 
Z, for every w € R. It follows from 2 that t > X,(w) only has jumps of 
size 1. Indeed, for w € Qs, let s:(w) be the jump at ¢ of the function 
1 — X,(w). If slw) = 2 for some t € R,, then, for every natural number 
n > t, there obviously exists a number 7 € (1,2, . . . ,n2” — 1} such 
that 


Xzn2-(w) — XG—y2~(w) 2 2 
21 Since the discrete topology is induced in Z by R, this means: For o E Q and 


t E R,, there is a number 6 > 0 such that X,(w) is constant and equal to X (o) for all 
s€ SwithO<s—t<6. 
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Thus we have 


{iG Oe: sw) 22)5C OO V B. 


Since, by the definition of Qs, the sets Q; and V Bm are disjoint, this 
proves s:(w) < 2 for allt € R,. Now t — X,(w) and hence t > Š (w) only 
take on entire values. Hence s:(w) = 0 or 1, that is, £ > X.(w) has jumps 
of size 1. 

5. For every w from the P?-null set C2, we define 


Xv) 29r (t ER. (12.7.6) 


Since S is countable, then obviously each of the functions X,: 2 > Z is 
the pointwise limit of a sequence of random variables on the probability 
space (Q,9(, P”) with values in Z, and hence (2,2, P”, (X ien.) is a stochas- 
tic process, with Z as state space, whose paths are all right continuous 
isotone functions with jumps of size 1. Moreover, for every t € R,, 


X, = X, P-almost surely. Cla) 


In fact, by (12.7.5), for every t there exists an antitone sequence (sn) in S 
with lim s, = ¢ such that (X,) converges P*-almost surely and hence 
stochastically (relative to P7) to X.. But by 3 the sequence (X,,) also 
converges stochastically (relative to P7) to X.. The almost sure equality 
of X, and X, now follows from this and from Section 2.11, Remark 3. 

6. Finally, we have to repeat the reasoning of the proof of Lemma 
12.2.8 with W in place of C in order to see that, because of 3, the set W 
is essential relative to the family (P7). J 


Remarks. |. Remark 2 of the preceding section is obviously valid 
here too. Therefore we speak of the Poisson process. 


2. If (Q96,(P*»),c7,(X)icn,) is the Poisson process and o > 0 is a real 
number, then we set 


Y, = X at (t = R,). 


Thus, we are stretching the time scale by the factor a. Then Definition 
12.4.4 immediately shows that (2,%,(P*)zez,(Yoier,) is also a Markov 
process, with state space Z, whose paths are all right continuous isotone 
functions f: R} — Z with jumps of size 1. For the associated semigroup 
(Q).en, of transition probabilities, we then obviously have Q,(z,B) = 
Tal(B — x) for arbitrary x € Zand B € $(Z). Here ta again denotes the 
Poisson distribution with parameter at in the case t 0 and the measure 
e; in the case ¢ = 0. Therefore, we call the new Markov process the * 
Poisson process with parameter « > 0. It obviously suffices to consider the 
case a = 1 specified by Definition 12.7.1. 
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The natural question concerning the probability of continuous paths 
for the Poisson process can be answered immediately: 


12.7.3. Theorem. Let (Q,9,(P*).cz,(X)ien,) be the Poisson process. 
Then for every x € Z we have: 
(a) For each to € Ry, every path t — X,(w) is P?-almost surely con- 
tinuous at fo. 
(b) Every path t > X,(w) has a point of discontinuity P*-almost surely. 


Proof. For (a): The set A of all w € Q for which t > X,(w) is discon- 
tinuous at fo lies in 9(. Indeed, when to > 0, 


A aye uet S 
eio M 


From this we obtain 
P*(A) = P*(X;, mS 1j =1-—- Qm 


for all rational s with 0 € s < t». By the passage to the limit s — to we 
obtain P*(A) = 0, and hence the assertion. At to = 0 all paths are con- 
tinuous because they are right continuous. 


For (b): The set B,, of all o € Q for which t > X,(w) is continuous on 
[0,4] lies in A, since, due to the right continuity of all paths, B, = 
{Xn = Xo} (to € R4). Hence, and from (12.7.4), we have 


P+(B,) = P*(X, — Xo = 0] =e. 


Then for the event of interest B, = (\ Bn, we obviously have 


n=1 


P(B) = lim P*(B,) 20. | 


n—> œ 


The above theorem as well as the remarks below seem to justify the use 
of the Poisson process for describing so-called signal processes. Let 
(Q,,(P*)zez,(Xdier,) be the Poisson process. We are particularly 
interested in the stochastic process (Q,A,P?,(X:)er,) which is obtained 
from the above family for x = 0. Since P°{ Xo = 0} = P4(0,10]) = 1, we 
then have: 


(a) Every path t > X,(w) is a right continuous isotone mapping of Ry, 
into Z with jumps of size 1. X, 2 0 holds P°-almost surely. 


(b) The process has stationary and independent increments. 


This last property (b) follows from Theorems 12.5.2 and 12.5.3. 


406 FURTHER DEVELOPMENT OF PROBABILITY THEORY 


We can now interpret X, as the number of all “signals”? sent randomly 
by a “transmitter” in the time interval [0,/[. Here the transmitter can be 
a radioactive material; every emission of an a-particle would then be 
interpreted as a signal. But the transmitter can also be considered as the 
set of subscribers connected with a telephone central. Every call by one 
of the subscribers would then be a signal. Now 


ma({2}) = e €9»((t — 8)*/2!) 


is the probability that exactly 7 = 0, 1, . . . signals occur in a time 
interval of length t — s > 0. Then obviously 


(e) lim Tis tll) sd 
iet 
t>s 

and 


lim —— (1 — m({0}) — mea({1})) = 0. 

fes c d 

t>s 
The probability of the occurrence of a single signal in a time interval of 
short length t — s is thus asymptotically equal to this length, while the 
probability of occurrence of at least two signals in a smal] time interval 
[s,t] is small compared to the length ¢ — s. A signal process is defined by 
means of the properties (a) to (c). Here it is appropriate to replace the 
number 1 on the right side of the first equality in (c) by a number a > 0. 
This amounts to the introduction of the Poisson process with parameter 
a > 0 (Remark 2). 


Remark. 3. Besides the Brownian and Paisson processes, the stable 
symmetric processes are also significant among Markov processes. These 
are derived from the stable symmetric semigroup of order a € ]0,2] with 
parameter c > 0 (from Section 12.3, Example 6) and have right con- 
tinuous paths by definition. The interested reader will find the details in 
Blumenthal-Getoor [24] and Meyer [39]. 


APPENDIX: CONTINUOUS 
MAPPINGS INTO THE CIRCLE 


Let E be a topological space and let 
U= {2 €C: J| = 1} 


be the unit circle. A continuous mapping f: E — U is said to be un- 
essential if there is a continuous real function e: E — R such that 


f= er (A.1) 


Obviously, together with e, e + 2rk is also a function of this kind for 
every k € Z. For a connected space E, these are all continuous real] func- 
tions on Æ with property (A.1). Indeed, for another such function y, we 
have e'(*—? = 1, and thus ® = (1/2-)(V — e) is a continuous mapping 
of E into the discrete space Z of integers. Since F is connected, #(E) is 
also connected, and thus $(E) = {k} for a fixed k € Z. Hence y = e + 
2rk. For connected E, the continuous real function e is uniquely deter- 
mined by (A.1) and its value at a given point of £F. 

The principal aim of the Appendix is the proof of the following theorem 
and several of its consequences.? 


A.1. Theorem. Every continuous mapping f: R? — U is unessential 
(noes 15527 «0: 


1 [f there are no such functions e, then the mapping f is said to be essential. 
? [n the first part of the reasoning we follow J. Dieudonné, Foundations of Modern 
Analysis, Academic Press, Inc., New York: 1960. 
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We prepare the proof by some simple properties of unessential mappings. 
We let E denote a topological space, arbitrary at the moment. 


Every continuous mapping f: E — U with f(E) # Uisunessential. (A.2) 


In fact, let e C UNf(E) for a suitable a € R. Then we know that 
t — eis a homeomorphism 6 of Ja,a + 2z[ onto U \ {e*}. For the inverse 
homeomorphism Y we have z = e¥® for all z C U \ {e*}. Thus, e = 
Y of is a continuous real function on E with f = ev. — 


If f1 and f» are continuous mappings of E into U 
such that fi(z) # —f»(x) for all x € E, (A.3) 
then together with fi, f» is also unessential. 


Indeed, f = fi/f»is a continuous mapping of E into U, which does not 
attain the value —1 and thus is unessential by (A.2). Hence there are 
continuous real functions e and «i on E such that f = e'* and fı = e'*.. 
Since fo = e'(:-9, fo is also unessential. J 


If E is a compact space and f: E X [0,1] — U 
is a continuous mapping, then along with x — f(z,0), (A.4) 
the mapping x  f(x,1) is also unessential.? 


Because of the uniform continuity of f on the compact space E X [0,1], 
there is a natural number n such that x € FE, s, t € [0,1] and |s — t| € 1/n 
imply |f(z,s) — f(z,0)| € 1. Thusif, forevery 7 = 0,1, . . . , n, we define 


(oem) cem 


then |f;(x) — fji(x)| < 1 for all x € E. If we had f;(x) = —fi41(x) for 
some x € E, this would imply that 2|f;(x)| € 1, which contradicts 
fr) = 1. Thus f(z) 4+ —fuai(zr) for all cz € E and 750, L ...; 
n — 1. The n-fold application of (A.3) yields the assertion. J 


Now we carry out the proof of Theorem A.1. 


Proof. Every continuous mapping fr: K,— U of the compact ball 
K, = {x € R?: |x| € r} in R? of radius r > 0 into the unit circle U is 
unessential. Indeed, g(x,t) = f,(tz) is a continuous mapping of K, X [0,1] 
into U with g(z,1) = f,(x) and g(z,0) = f,(0) for allz € K,. Thus, the 
fact that f, is unessential follows from (A.2) and (A.4). 

For a continuous mapping f: R? — U, we consider its restriction f, to 


3 In other words, (A.4) says: Every mapping fı: E + U, which is homotopic to an 
unessential mapping fo: E — U is itself unessential for compact E. 
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K,(n — 1,2, . . ). Then, by what has just been proved, forevery n € N, 
there is a continuous function gn: Kn > R with f, = e». Let y,,1 be the 
restriction of gn: to Kn. Since K, is connected, then according to our 
observations following (A.1) there is an integer k, such that y,,; = 


Pn + 2rk,. Thus, if we set gf = pı and 97.) esi — 2r(ki + -© + kn) 
for each n = 1, 2,..., every ce: K,—R is a continuous func- 
tion with f, = e'&^ and moreover ez is the restriction of ¢*,, to Kn 
(n = 1,2, . . .). Then if e: R?  R denotes that function whose restric- 


tion to K, coincides with ¢* (n € N), ¢ is continuous and f = e'*; hence, 
fis unessential. | 


A.2. Corollary 1. For every continuous mapping f: R? — C* of R? 
into the punctured complex plane C* = C \ {0} with f(0) € R,, there 
exists exactly one continuous function e: R? — R such that ¢(0) = 0 and 


REIPI 


Proof. The mapping g = f/|f| is unessential by Theorem A.1, and thus 
there exists a continuous real function y on R? with g = e*. Since g(0) = 1, 
we have y(0) = 2rk with suitable k € Z. Therefore, e = y — 2rk is a 
continuous real function on R? with (0) = 0 and f = |fle'. It follows 
from the introductory discussion that is uniquely determined by the 
given properties. J 


A.3. Corollary 2. If the mapping f of Corollary A.2 is n times con- 
tinuously differentiable, then the associated function e is also n times 
continuously differentiable (n = 1, 2, . . .). 


Proof. Since f is n times continuously differentiable on R?, it follows 
that f/|f| is too. Thus we can assume that |f(x)| = 1 for all x € R?. For 
arbitrary given zo € R», we choose a number r > 0 such that f(K) C D, 
where D = {z E C: |z — f(xo)| < 1} and where K is the open ball 
{x € R?: |x — zo| < r} with center zo and radius r. There exists a 
(holomorphic) branch of the complex logarithm in D, say z — log z; hence, 
log f(x) is defined for all x € K. Since f(x) = e*? = el/9(x € K), 
there is a function k: K — Z such that g(x) = —7 log f(x) + 2rk(x) for 
all z € K. Hence, the continuity of k follows; thus, the function k is 
constant because of the connectedness of K. Now (£) = —i log f(x) + 
const. for all x € K implies the assertion, since the branch z — log z in D 
is holomorphic. J 


Finally, we show: 


A.4. Theorem. Let (o,)igw be a sequence of continuous real func- 
tions on a connected compact space E. If the sequence (e'**),en on E 
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converges uniformly to the constant function 1 and if there is an element 
a € E with gla) = 0 for all k = 1, 2, .. . , then the sequence (yz) on 
E converges uniformly to 0. 


Proof. We write the sequence being investigated in the form (27¢,) 
without having to change anything in the hypotheses on the function gz. 
For every real number y € R, let {y} denote an integer satisfying 


Y —Ósd 


This is uniquely determined as the nearest integer if y is not of the form 
n + 4 with n € Z. Then [9] is afunction on E with integer values, that is, 
e ?"(e = 1. Hence (e-?ri(ex-te?) converges uniformly to 1. Thus for every 
e > 0 there is a natural number k, with 


Jerri 71e 0D. — 1| < e, for all z € E and k = ke. 


A simple calculation of the square of the above absolute value yields 
lsin r(ex(x) — («ex(z)])| < s for all z € E and k 2 k.. 


Now we have |ex(z) — {¢x(x)}| € 4 and thus |r(ex — (&ex])| € 7/2 for all 
k EN. From the inequality (2/7)|£| € |sin £|, valid for all £ € R with 
l| € 7/2, it follows that 


le) — (ex2)]l < 2 forallz € E and k E ke  (A.5) 


As a consequence of the uniform continuity of e; on the compact space E 
for every k = k,, there is a covering of E by finitely many nonempty 
open sets U$P, . . . , UP, depending on k, such that for arbitrary points 
#, x in the same set U;” (7 = 1, . . . mg), 


lez) — ex)| < =: 


1 
Hence, and from (A.5), we now obviously have 
ltexx)) — (ox(2?))| < $e (A.6) 


If e < $, then (A.6) yields 
Ibex) — (oxx))] <1 


and thus 

{eu(z)} = (ex); (A.7) 
for arbitrary z, z' C Uf? and all j =1,..., mm. If we decompose 
{1, . . . , ną} into two disjoint nonempty sets J and K, then, because E 


4 This proof arose from a discussion with Professor E. Hlawka. 
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is connected, the set V / Uf? is not disjoint from V U®. Therefore, (A.6) 
JEJ JEK 
implies that 
(ex(z)] = const., for all z € E. 


But since g(a) = 0 by hypothesis and therefore (e,(a)] = 0, we have 
(ex) =0 (EE, kz k). 


By (A.5) we finally obtain |ex(x)| < e for all z € E and all k = ke pro- 
vided e < $. But this is the asserted uniform convergence on E of (o) 
to zero. J 


A.5. Corollary. Let (%x)xen be a sequence of continuous real func- 
tions on R?. Then, if the sequence (e'*);ew converges on every compact 
subset of R? uniformly to the constant 1 and if ¢,.(0) = 0 for all k = 1, 2, 
... , the sequence (px) converges uniformly to 0 on every compact 
subset of R?. 


Proof. We apply the above theorem to every compact ball E = K, of 
radius n € N with center 0. J 
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R, R, R,, R,, N, 2, 3 
Z, 376 
C, 242 


eee yt ea 

jj: oro E (an)nen, 3 

PQ), 7 

Q’ AA (9( a o-algebra), 8 

IES 

3((G), 8, 9 

3(T4; 4 € I), 35 

3((8), 196 

D(C), 11 

£» 19; 29 

BUE UE, SEL T5 

<j, 18 

[a,b[, 18 

Ja,b[, 31 

Jab], [a,b], 33 

62; 5. I8 

NP, 22, 30 

A5, 91 

Br, 30 

H = BEL), Bo = BL), (E a topo- 
logical space), 204 

G», KR?, OF, 31 

(2,2), (2,24), 34 


(0,9, P), 131 

T: (0,90) — (Q,9l), 34 

T (u), 36 

Bi, 44 

14, 45 

(FS OF = oh Utah 
‘J = 935 0f 9946 

(X € A’}, 140 

& = &(9D, 49 

§* 53 

SF du, oe Sf) dulo), 
50, 5 


N,(f), 66, 76 
£?(u), 68, 69 
L"(u), 75, 76 
lim sup A,, lim inf A,, 71 


n—> no 


2x05, 109 


I Q; (Q; sets), 163 
iEI 
417 
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& 9, — 90 - - - @ An, 109 
i-1 

& A; (A; e-algebras), 164 

i€l 

@ ui =u +--+ GO un (m meas- 
121 


ures), 113, 119 
& P; (P; probability measures), 
i€l 


168 

(Q, A; u:i), 120 
i=1 
& (QA, Pj), 168 
a 


CoA s xU 4069) E 


i=l 

> X; (X; mappings, random vari- 
ables), 168 

Ui* ` oc *ns (uj measures), 121 

f *v (f a function, v a measure), 
123 

f *g (f,g functions), 123, 244 

P(A | B), 136 

P(X € A’}, 140, 

Px, 140 

E(X), V(X), o(X), 140, 141 

Cov (X,Y), 160 

e; 19, 23, 142 

B5, Ta, Gaan Varn Ya, 1438-145 

F4, 194 

gt, 196 

C= CE), G^ = eR) 204 

e(E,C), e'(E,C) (E a topological 
space), 244 


@ Xn, 156 


Co = (EH) CO? = C(L), 212) 215 

e«(E,C), e*(E,C) (E a locally com- 
pact space), 244 

Il f||, 215, 244 

S;, 212, 244 

m = ME), me = WE), 
M(E), 226 

$2:2290 

Rf, If, 242 

«zy , |x|, 244 

a, f, 245, 249 

ex, 245 

TN kp» 263 

E®(X) = E(X | 8), BE%se(X) = 
E(X | Y; i € I), EY! Ya( X) = 
TOO eee aod 

P? = P(A | 3), 309 

EY—(X) = E(X | Y = y), 313 

Eg(X), Pg (B an event with 
P(B) > 0), 302 

uK, Kf’ (K a kernel), 316 

Px, 318 

Wr, Xr (T a stopping time), 334 

Ü t5, Uta), 338 

(Q,9G, P, (X )ier), 358 

E, 88, 358 

{X E Buss . . , Xn € Bn}, 359 
lim P; = lim P;, 360 
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Absolutely normal number, 186 
Absorbing point, 380 
set, 389 
Abstract integral, 195 
Adapted family of random variables, 
326 
Additivity, finite, 13 
c-, 13 
sub-, 14 
Algebra, 9 
relative to convolution, 125 
Almost everywhere, 62 
impossible event, 132 
sure event, 132 
surely, 132 
Antitone, 3 
Asymptotically negligible, 275 
At infinity, spaces countable, 213 
vanishing, 215, 244 


B 


Baire function, 45 
measure, 219 
discrete, 230 
set, 204 
Banach space, 75 


Bayes formula, 136 
Bernoulli, Jacob, 148, 187, 188 
Bernoulli distribution, 143 

convergent, 232 

trial sequence, 172 
Bienaymé equality, 159 
Binomial distribution, 143 
Borel, E., 30, 173, 176, 188 
Borel-Cantelli lemma, 174 
Borel-measurable, 45, 233 
Borel measure, 219 

finite, 120, 206 

set, 30, 204 

zero-one law, 176 
Boundary, null-, 234 
Bounded set of measures, 238 
Brownian convolution semigroup, 291 

motion, 357, 396 

motion process, 396 

semigroup, 376 


C 


Cantor-like set, 237 
Cantor set, generalized, 237 
Carathéodory, C., 25 
Cauchy, convolution semigroup, 291 
distribution, 145 
semigroup, 377 


421 


422 INDEX 
Cauchy (continued) 
sequence in £?(y), 73 
sequence relative to stochastic con- 
vergence, 99 
Centered on expected value, 142 
Central limit theorem, 278 
Chapman-Kolmogorov equations, 374 
Characteristic function, 245 
Charge distribution, 23 
Chebyshev inequality, 180 
Chebyshev-Markov inequality, 92 
Chi-square distribution (x?-distribu- 
tion), 261 
Choquet, G., 296 
Combining independent c-algebras, 153 
Completeness of £?-spaces, 73 
Completion of a measure, 30 
Conditional distribution, 318 
expectation, 304, 313 
probability, 136, 249, 308 
Content, 13 
Continuity from above, 15 
from below, 15 
Ø-, 15 
relative to a measure, 84 
theorem for Fourier transforms, 258 
Convergence in measure, 93 
dominated, 72 
factor, 262 
in pth mean, 70 
stochastic, 93 
Convolution, 121, 123, 124, 244 
product, 121, 123, 124, 244 
semigroup, 289 
Brownian, 291 
Cauchy, 291 
Poisson, 291 
Counting measure, 23 
Cylinder (-set), 165 


D 


Daniell, P., 197, 203 
Daniell-continuous, 317 
de Moivre, A., 274, 285 
Density, 82 


Radon-Nikodym, 89 

stable symmetric, 300 
Derivative, Radon-Nikodym, 355 
Deviation, standard, 141 
Differentiation theory, 353 
Dirac measure, 90 
Direct sum of measurable spaces, 137 
Dirichlet jump function, 79 
Distribution, 140, 142 

discrete, 143 

function, 146 

joint, 156 

rectangular, 289 

singular, 142 

stable, 298 

uniform, 289 
Dominated convergence, 72 
Doob, J. L., 326, 364 
Doob inequalities, 338 
Downcrossings, number of, 338 
Dynkin, E. B., 10 
Dynkin system, 11 

generated, 11 


E 


Einstein, A., 398 
Elementary content, 18 
event, 132 
function, 49 
Entry time, first, 333 
e-bound of order p, 101 
Equi-continuity of measures, 107 
Essential path set, 364 
Event, 132 
almost impossible, 132 
almost sure, 132 
impossible, 132 
sure, 132 
tail, 153 
terminal, 153 
Events, inconsistent, 132 
independent, 150 
Expectation kernel, 314 
Expected value, 140, 244 


Extension theorem for measures, 23 
F 


Fatou's lemma, 71 
Feller, W., 279, 280 
Feller condition, 279 
Figure, p-dimensional, 18 
Filtering to the left, 218 
to the right, 344 
S-open, 195 
Fourier transform of a function, 249 
of a measure, 245 
F,-set, 205 
Fubini theorem, 116, 120 
Function, Baire, 45 
Borel-measurable, 45, 233 
characteristic, 245 
integrable, 58 
L?-, 67 
measurable, 44 
numerical, 3 
real, 3 


G 


Gamma distribution, 261 

Gaussian bell-shaped curve, 145 
distribution, 144 

Generator of a Dynkin system, 11 
of a c-algebra, 9 


H 


Haar measure, 40 

Hájek, J., 178 

Hansen, W., 367 

Hilbert space, 75 

Holder inequality, 66 
Hypergeometric distribution, 146 


I 


Ideal, 10 
o-, 17 
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Image measure, 36 
Increments, independent, 390 
stationary, 390 
Independence of events (stochastic), 
150 
of random variables, 155 
of sets of events, 151 
Indicator function, 45 
variable, 139 
Inequality of Chebyshev, 180 
of Chebyshev-Markov, 92 
of Hájek-Rényi, 178 
of Hólder, 66 
of Jensen, 322, 324 
of Kolmogorov, 180 
of Minkowski, 67 
Infinitely divisible, 162, 287 
Infinity, point at, 212 
Integrability, 58, 64, 243 
uniform, 101 
uniform of order p, 101 
Integral, abstract, 195 
of almost everywhere defined func- 
tions, 64 
of elementary functions, 50 
of measurable complex functions, 243 
of measurable nonnegative functions, 
54 
of measurable numerical functions, 
58 
(\-stable, 12 
Interval, compact, 33 
half-open, 18, 33 
open bounded, 31 
Invariance principles, 400 
Inversion formula, 262 
Isotonicity, 3, 14 


J 
Jensen inequality, 322, 324 
K 


Kernel, 314 
Markov, 314 
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Kernel (continued) translation-invariant, 75 
sub-Markov, 314 Martingale, 327 
Khinchin, A. J., 189, 296 sub-, 327 
Kolmogorov, A. N., 131, 153, 181, 182, super-, 327 
371, 374 Mass distribution, 23 
Kolmogorov criterion, 181 Measurable, M-A’, 34 
zero-one law, 153 Borel-, 45, 233 
Kronecker lemma, 350 function, 44, 242 
K,-set, 205 hull, 29 
Lebesgue, 43 
L mapping, 34 
set, 34 
Laplace, S. P., 133, 274 hber aded 34 
Laplace experiment, 133 Pd : 
ae 123 Measure, 23 
Law of large numbers, strong, 173 Conlon, eae 
k. 188 counting, 23 
Do us s with density, 82 
PERETI finite, 28 
-null set, 32 CENA, 
Lebesgue, H., 22, 75, 254 ied a 36 
Lebesgue-Borel measure, 30 se; 
L-B-, 30 
measure space, 34 
Lebesgue continuous, 143 el i 
eae Lebesgue-Borel, 30 


integral, 78 


measure, 43 outer, 25 


regular, 208 


premeasure, 22 3 
Lemma of Fatou, 71 d uid 
Levi, B., 55 SPATS, 
; space, Lebesgue-Borel, 34 


Lévy-Khinchin formula, 296 

Lindeberg, J. W., 278, 280 

Lindeberg condition, 278 

Linear form, isotone, 60 
positive, 60 


stable, 298 

Wiener, 398 
Minkowski inequality, 67 
Moment, 141, 263 
Ljapunov condition, 279 Monotone convergence theorem, 55 
Locally compaet space, 211 u-continuous, 84 
L»-function, 67 Multinominal distribution, 146 
Multiplieation theorem, 158 


M 
N 
Markov, A., 92 
Markov process, 385 Negative part, 58 
property, elementary, 381 Newtonian kernel, 315 
strong, 385 Nikodym, O. M., 86 
weak, 385 Non-Borel set, 32, 42 


semigroup, 374 Non-Lebesgue measurable set, 43 


Norm of uniform convergence, 215, 244 


Normal, absolutely, 186 
number, g-, 186 
representation, 49 

Normal distribution, 144 
characterization of, 299-300 
p-dimensional, 291 
standard, 144 

Null-boundary, 234 
set, 62 

L-B-, 32 


(0) 


One point compactification, 212 
Orthogonal increments, 401 
transformation invariance 
(of L-B-measure), 40 


P 


Parameter set, 358 
Path, 358 
Point at infinity, 212 
Poisson convolution semigroup, 291 
distribution, 144 
process, 401 
semigroup, 376 
Polish space, 208 
Polya’s urn model, 138 
Positive definite, 246 
part, 58 
Potential theory, 400 
Power set, 7 
Premeasure, 13 
Lebesgue, 22 
Probability, 132 
conditional, 136, 308 
density, 143 
formula of total, 136 
law, 140 
measure, 132 
indecomposable, 289 
stable, 298 
space, 131 
Laplace, 133 
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-theoretic concept, 140 
Process, 358 
Brownian motion, 396 
canonical (first), 362 
with independent increments, 390 
Markov, 385 
Poisson, 401 
stable symmetric, 406 
with stationary increments, 390 
stochastic, 358 
Product of measure spaces, 120 
of measures, finite, 113, 119 
of probability measures, infinite, 
168 
of probability spaces, infinite, 168 
of c-algebras, finite, 109 
of c-algebras, infinite, 164 
Prohorov, J. V., 367, 371 
Projection mapping, 109, 163 
Projective family of probability 
measures, 359 
limit, 360 
Pseudo metric, 70 


R 


Radon, J., 86 
Radon measure, 222 
-Nikodym density, 89 
derivative, 355 
Random variable, 139 
admissible, 304 
centered, 142 
elementary (simple), 139 
Random variables, identically dis- 
tributed, 182 
Realization of process, 358 
Rectangular distribution, 289 
Reflection invariance, 38 
Regularity, inner, 208 
of a measure, 208 
outer, 208 
Relay-experiment, 137 
Rényi, H., 178 
Representation theorem of F. Riesz, 219 
Riemann, B., 254 
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Riemann integral, 78-80 
Riesz, F., 194, 219 
Riesz space, 194 
Ring, 9 

generated, 20 


S 


Section of a function, 114 
of a set, 111 
Semigroup, Brownian, 376 
Cauchy, 377 
of heat equation, 380 
of kernels, 374 
Markov, 374 
Poisson, 376 
stable (of order a), 377 
sub-Markov, 374 
of transition probabilities, 388 
translation-invariant, 375 
Seminorm, 75 
Sieveking, M., 367 
c-algebra, 7 
of events up to time 7’, 334 
of events up to time ¢, 381 
generated, 9 
by mappings, 35 
o-finite, 27 
o-subalgebra, 301 
Signal process, 406 
Smoluchowski, M., 398 
Squarable, 234 
Stable, /-, 12 
probability measure, 298 
symmetric density, 300 
symmetric process, 406 
symmetric semigroup, 377 
U-, 12 
Standard deviation, 141 
Standardized sum, 278 
Starting probability, 377 
Step function, 49 
Stochastic convergence, 93 
independence, 150 
limit, 93 
matrix, 315 


process, 358 
Stone, M. H., 194, 195, 197, 224 
Stone condition, 194 

vector lattice, 194 
Stopping time, 332 
Sub-martingale, 327 
Subtractivity, 14 
Super-martingale, 327 
Support of a function, 212, 244 
Symmetric difference, 10 


T 


'Tail event, 128 
Terminal event, 153 
Theorem, Daniell-Stone, 197 
de Moivre-Laplace, 274 
Dini, 218 
dominated convergence, 72 
Fubini, 116, 120 
Lindeberg-Feller, 280 
monotone convergence, 55 
Radon-Nikodym, 86 
Riemann-Lebesgue, 254 
Stone-Weierstrass, 224 
Time set, 358 
Trace of a c-algebra, 8 
Trajectory, 358 
Transformation theorem for integrals, 
91 
Transition probabilities, 388 
Translation-invariance of L-B- 
measure, 37 
of Markov semigroup, 375 
Triangular distribution, 297 


U 


Uncorrelated, 159 
Unessential mapping, 407 
Uniform integrability, 101 
U- stable, 12 
Uniqueness theorem for extension of 
measures, 27 
for Fourier transforms, 255 


Unit element for convolution, 123 
kernel, 315 
mass, 13 

Uperossings, number of, 338 


V 


Vague convergence, 226 
topology, 228 
Vanishing at infinity, 215, 244 
Variance, 141 
Vector lattice, 194 
Stone, 194 
Version of conditional expectation, 304 
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WwW 


Weak convergence, 233 
topology, 233 

Wiener, N., 398 

Wiener measure, 398 

With probability one, 132 


Z 


Zero-one law, Borel, 176 
Kolmogorov, 153 
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